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Abstract 

To appear in Theory and Practice of Logic Programming (TPLP) 

This paper develops a declarative language, P-log, that combines logical and probabilistic arguments in its reasoning. Answer 
Set Prolog is used as the logical foundation, while causal Bayes nets serve as a probabilistic foundation. We give several non- 
trivial examples and illustrate the use of P-log for knowledge representation and updating of knowledge. We argue that our 
approach to updates is more appealing than existing approaches. We give sufficiency conditions for the coherency of P-log 
programs and show that Bayes nets can be easily mapped to coherent P-log programs. 

KEYWORDS: Logic programming, answer sets, probabilistic reasoning. Answer Set Prolog 



1 Introduction 

The goal of this paper is to define a knowledge representation language allowing natural, elaboration tolerant 
representation of commonsense knowledge involving logic and probabilities. The result of this effort is a language 
called P-log. 

By a knowledge representation language, or KR language, we mean a formal language L with an entailment relation 
E such that (1) statements of L capture the meaning of some class of sentences of natural language, and (2) when 
a set S of natural language sentences is translated into a set T{S) of statements of L, the formal consequences of 
T{S) under E are translations of the informal, commonsense consequences of S. 

One of the best known KR languages is predicate calculus, and this example can be used to illustrate several 
points. First, a KR language is committed to an entailment relation, but it is not committed to a particular inference 
algorithm. Research on inference mechanisms for predicate calculus, for example, is still ongoing while predicate 
calculus itself remains unchanged since the 1920's. 

Second, the merit of a KR language is partly determined by the class of statements representable in it. Inference 
in predicate calculus, e.g., is very expensive, but it is an important language because of its ability to formalize a 
broad class of natural language statements, arguably including mathematical discourse. 

Though representation of mathematical discourse is a problem solved to the satisfaction of many, representation of 



2 



C. Baral, M. Gelfond and N. Rushton 



other kinds of discourse remains an area of active research, including work on defauhs, modal reasoning, temporal 
reasoning, and varying degrees of certainty. 

Answer Set Prolog (ASP) is a successful KR language with a large history of literature and an active community of 
researchers. In the last decade ASP was shown to be a powerful tool capable of representing recursive definitions, 
defaults, causal relations, special forms of self-reference, and other language constructs which occur frequently in 
various non-mathematical domains dBaral 20031) . and are difficult or impossible to express in classical logic and 
other common formalisms. ASP is based on the answer set/stable models semantics JGelfond et al. 19881 ) of logic 
programs with default negation (commonly written as not ), and has its roots in research on non-monotonic logics. 
In addition to the default negation the language contains "classical" or "strong" negation (commonly written as -i) 
and "epistemic disjunction" (commonly written as or). 

Syntactically, an ASP program is a collection of rules of the form: 

k) or .. . or k ^ k+i, ■ ■ ■ Jm, not Im+i, . . . ,not In 

where /'s are literals, i.e. expressions of the form p and -^p where p is an atom. A rule with variables is viewed 
as a schema - a shorthand notation for the set of its ground instantiations. Informally, a ground program 11 can 
be viewed as a specification for the sets of beliefs which could be held by a rational reasoner associated with 11. 
Such sets are referred to as answer sets. An answer set is represented by a collection of ground Uterals. In forming 
answer sets the reasoner must be guided by the following informal principles: 

1 . One should satisfy the rules of 11. In other words, if one believes in the body of a rule, one must also believe in 
its head. 

2. One should not believe in contradictions. 

3. One should adhere to the rationality principle, which says: "Believe nothing you are not forced to believe." 

An answer set 5 of a program satisfies a literal I if I £ S; S satisfies not I if I ^ S; S satisfies a disjunction if it 
satisfies at least one of its members. We often say that if p £ S then p is believed to be true in S, if ^p E S then p 
is believed to be false in S. Otherwise p is unknown in S. Consider, for instance, an ASP program Pi consisting 
of rules: 



1. p{a). 

2. ^p{h). 

3. q{c) ^ not p{c), not ->p{c). 

4. -nq{c) ^ p{c). 

5. -nq{c) ^ -^p{c). 



The first two rules of the program tell the agent associated with Pi that he must believe that p{a) is true and p{b) 
is false. The third rule tells the agent to believe q{c) if he believes neither truth nor falsity of p(c). Since the agent 
has reason to believe neither truth nor falsity of p{c) he must believe q{c). The last two rules require the agent 
to include -^q{c) in an answer set if this answer set contains either p{c) or -^p{c). Since there is no reason for 
either of these conditions to be satisfied, the program will have unique answer set Sq = {p{a), -^p{b), q{c)}. As 
expected the agent believes that p{a) and q{c) are true and that p{b) is false, and simply does not consider truth 
or falsity of p(c). 

If Pi were expanded by another rule: 



6. p{c) or -^p{c) 
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the agent will have two possible sets of beliefs represented by answer sets Si = {p{a), -^p{b), p{c), -^q{c)} and 
S2 = {p{a),^p{b),^p{c),^q{c)}. 

Now p{c) is not ignored. Instead the agent considers two possible answer sets, one containing p{c) and another 
containing -^p{c). Both, of course, contain -^q{c). 

The example illustrates that the disjunction (6), read as "believe p{c) to be true or believe p{c) to be false", is 
certainly not a tautology. It is often called the awareness axiom (for p{c)). The axiom prohibits the agent from 
removing truth of falsity of p{c) from consideration. Instead it forces him to consider the consequences of believing 
p{c) to be true as well as the consequences of believing it to be false. 

The above intuition about the meaning of logical connectives of ASF0 and that of the rationality principle is 
formalized in the definition of an answer set of a logic program (see Appendix III). There is a substantial amount of 
literature on the methodology of using the language of ASP for representing various types of (possibly incomplete) 
knowledge jBaral 2003l l. 

There are by now a large number of inference engines designed for various subclasses of ASP programs. For 
example, a number of recently developed systems, called answer set solvers, jNiem ela and Simons 19971 12002 I 
[Citrigno et al. 1997 ILeone et al. 20061 ILierler 20051 ILin and Zhao 20041 [Gebser et a l. 2007) compute answer sets 



of logic programs with finite Herbrand universes. Answer set programming, a programming methodology which 
consists in reducing a computational problem to computing answer sets of a program associated with it, has been 
successfully applied to solutions of various classical Al and CS tasks including planning, diagnostics, and con- 
figuration jBaral 2003l l. As a second example, more traditional query-answering algorithms of logic programming 
including SLDNF based Prolog interpreter and its variants ( [Apt and Poets 1994 [Chen, Swift and Warren 1995] ) 



are sound with respect to stable model semantics of programs without ^ and or. 

However, ASP recognizes only three truth values: true, false, and unknown. This paper discusses an augmentation 
of ASP with constructs for representing varying degrees of belief. The objective of the resulting language is to 
allow elaboration tolerant representation of commonsense knowledge involving logic and probabilities. P-log was 
first introduced in jBaral et al. 2004l l. but much of the material here is new, as discussed in the concluding section 
of this paper. 

A prototype implementation of P-log exists and has been used in promising experiments comparing its performance 
with existing approaches ( Gelfond et al. 2006t . However, the focus of this paper is not on algorithms, but on precise 
declarative semantics for P-log, basic mathematical properties of the language, and illustrations of its use. Such 
semantics are prerequisite for serious research in algorithms related to the language, because they give a definition 
with respect to which correctness of algorithms can be judged. As a declarative language, P-log stands ready to 
borrow and combine existing and future algorithms from fields such as answer set programming, satisfiability 
solvers, and Bayesian networks. 

P-log extends ASP by adding probabilistic constructs, where probabilities are understood as a measure of the 
degree of an agent's belief. This extension is natural because the intuitive semantics of an ASP program is given 
in terms of the beliefs of a rational agent associated with it. In addition to the usual ASP statements, the P-log 
programmer may declare "random attributes" (essentially random variables) of the form a{X) where X and the 
value of a{X) range over finite domains. Probabilistic information about possible values of a is given through 
causal probability atoms, or pr-atoms. A pr-atom takes roughly the form 

prr{a{t) = y\c B) = v 

where a{t) is a random attribute, B a set of literals, and v G [0, 1]. The statement says that if the value of a{t) is 
fixed by experiment r, and B holds, then the probability that r causes a(i) — y is v. 



^ It should be noted that the connectives of Answer Set Prolog are different from those of Propositional Logic. 
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A P-log program consists of its Jogical part and its probabilistic part. The logical part represents knowledge which 
determines the possible worlds of the program, including ASP rules and declarations of random attributes, while 
the probabilistic part contains pr-atoms which determine the probabilities of those worlds. If 11 is a P-log program, 
the semantics of P-log associates the logical part of 11 with a "pure" ASP program T(n). The semantics of a ground 
n is then given by 

(i) a collection of answer sets of t{I\) viewed as the possible sets of beliefs of a rational agent associated with 11, 
and 

(ii) a measure over the possible worlds defined by the collection of the probability atoms of 11 and the principle of 
indifference which says that possible values of random attribute a are assumed to be equally probable if we have 
no reason to prefer one of them to any other. 

As a simple example, consider the program 

a: {1,2,3}. 
random{a). 
pr{a = 1) = 1/2. 

This program defines a random attribute a with possible values 1, 2, and 3. The program's possible worlds are 
Wi = {a — 1}, W2 — {a = 2}, and W3 ^ {a ^ 3} . In accordance with the probability atom of the program, the 
probability measure /i( Wi) = 1/2. By the principle of indifference /i( W2) = /i( W3) — 1/4. 

This paper is concerned with defining the syntax and semantics of P-log, and a methodology of its use for knowl- 
edge representation. Whereas much of the current research in probabilistic logical languages focuses on learning, 
our main purpose, by contrast, is to elegantly and straightforwardly represent knowledge requiring subtle logical 
and probabilistic reasoning. A limitation of the current version of P-log is that we limit the discussion to models 
with finite Herbrand domains. This is common for ASP and its extensions. A related limitation prohibits pro- 
grams containing infinite number of random selections (and hence an uncountable number of possible worlds). 
This means P-log cannot be used, for example, to describe stochastic processes whose time domains are infinite. 
However, P-log can be used to describe initial finite segments of such processes, and this paper gives two small ex- 
amples of such descriptions (Sections 15. 3l and I5.4l l and discusses one large example in Section |53] We believe the 
techniques used by ( Sato 19931 ) can be used to extend the semantics of P-log to account for programs with infinite 
Herbrand domains. The resulting language would, of course, allow representation of processes with infinite time 
domains. Even though such extension is theoretically not difficult, its implementation requires further research in 
ASP solvers. This matter is a subject of future work. In this paper we do not emphasize P-log inference algorithms 
even for programs with finite Herbrand domains, though this is also an obvious topic for future work. However, our 
prototype implementation of P-log, based on an answer set solver Smodels dNiemela and Simons 1997) , already 
works rather efficiently for programs with large and complex logical component and a comparatively small number 
of random attributes. 

The existing implementation of P-log was successfully used for instance in an industrial size applica- 
tion for diagnosing faults in the reactive control system (RCS) of the space shuttle (IBalduccini et al. 20011 
IBalduccini et al. 20021 ). The RCS is the Shuttle's system that has primary responsibility for maneuvering the air- 
craft while it is in space. It consists of fuel and oxidizer tanks, valves, and other plumbing needed to provide 
propellant to the maneuvering jets of the Shuttle. It also includes electronic circuitry: both to control the valves 
in the fuel lines and to prepare the jets to receive firing commands. Overall, the system is rather complex, in that 
it includes 12 tanks, 44 jets, 66 valves, 33 switches, and around 160 computer commands (computer-generated 
signals). 

We beUeve that P-log has some distinctive features which can be of interest to those who use probabilities. First, 
P-log probabilities are defined by their relation to a knowledge base, represented in the form of a P-log program. 
Hence we give an account of the relationship between probabilistic models and the background knowledge on 
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which they are based. Second, P-log gives a natural account of how degrees of belief change with the addition 
of new knowledge. For example, the standard definition of conditional probability in our framework becomes 
a theorem, relating degrees of belief computed from two different knowledge bases, in the special case where 
one knowledge base is obtained from the other by the addition of observations which eliminate possible worlds. 
Moreover, P-log can accommodate updates which add rules to a knowledge base, including defaults and rules 
introducing new terms. 

Another important feature of P-log is its ability to distinguish between conditioning on observations and on delib- 
erate actions. The distinction was first explicated in (IPearl 2000l l. where, among other things, the author discusses 
relevance of the distinction to answering questions about desirability of various actions (Simpson paradox dis- 
cussed in section 152] gives a specific example of such a situation). In Pearl's approach the effect of a deliberate 
action is modeled by an operation on a graph representing causal relations between random variables of a domain. 
In our approach, the semantics of conditioning on actions is axiomatized using ASP's default negation, and these 
axioms are included as part of the translation of programs from P-log to ASP. Because Pearl's theory of causal 
Bayesian nets (CBN's) acts as the probabilistic foundation of P-log, CBN's are defined precisely in Appendix II, 
where it is shown that each CBN maps in a natural way to a P-log program. 

The last characteristic feature of P-log we would like to mention here is its probabilistic non-monotonicity — 
that is, the ability of the reasoner to change his probabilistic model as a result of new information. Normally 
any solution of a probabilistic problem starts with construction of probabilistic model of a domain. The model 
consists of a collection of possible worlds and the corresponding probability measure, which together determine 
the degrees of the reasoner's beliefs. In most approaches to probability, new information can cause a reasoner to 
abandon some of his possible worlds. Hence, the effect of update is monotonic, i.e. it can only eliminate possible 
worlds. Formalisms in which an update can cause creation of new possible worlds are called "probabilistically 
non-monotonic". We claim that non-monotonic probabilistic systems such as P-log can nicely capture changes in 
the reasoner's probabilistic models. 

To clarify the argument let us informally consider the following P-log program (a more elaborate example involving 
a Moving Robot wiU be given in Section l53T l. 

a: {1,2,3}. 

a = 1 <— not abnormal. 
random{a) ^ abnormal. 

Here a is an attribute with possible values 1, 2, and 3. The second rule of the program says that normally the value 
of a is 1. The third rule tells us that under abnormal circumstances a will randomly take on one of its possible 
values. Since the program contains no atom abnormal the second rule concludes a = 1. This is the only possible 
world of the program, /i(a = !) = !, and hence the value of a is 1 with probability 1. Suppose, however, that 
the program is expanded by an atom abnormal. This time the second rule is not applicable, and the program has 
three possible worlds: Wi = {a = 1}, W2 = {a = 2}, and Ws = {a = 3}. By the principle of indifference 
/i( Wi) = /i( W2) — /i( W3) = 1/3 - attribute a takes on value 1 with probability 1/3. 

The rest of the paper is organized as follows. In Section|2]we give the syntax of P-log and in Section[3]we give its 
semantics. In Section|4]we discuss updates of P-log programs. Section|5]contains a number of examples of the use 
of P-log for knowledge representation and reasoning. The emphasis here is on demonstrating the power of P-log 
and the methodology of its use. In Section |6] we present sufficiency conditions for consistency of P-log programs 
and use it to show how Bayes nets are special cases of consistent P-log programs. Section|2]contains a discussion 
of the relationship between P-log and other languages combining probability and logic programming. Section 
8 discusses conclusions and future work. Appendix I contains the proofs of the major theorems, and appendix 
II contains background material on causal Bayesian networks. Appendix III contains the definition and a short 
discussion of the notion of an answer set of a logic program. 
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2 Syntax of P-log 

A probabilistic logic program (P-log program) 11 consists of (i) a sorted signature, (ii) a declaration, (iii) a regular 
part, (iv) a set of random selection rules, (v) a probabilistic information part, and (vi) a set of observations and 
actions. Every statement of P-log must be ended by a period. 

(i) Sorted Signature: The sorted signature S of 11 contains a set of objects and a set F of function symbols. 
The set f is a union of two disjoint sets, Fr and Fa - Elements of Fr are called term building functions. Elements 
of Fa are called attributes. 

Terms of P-log are formed in a usual manner using function symbols from Fr and objects from 0. Expressions of 
the form a(t), where a is an attribute and I is a vector of terms of the sorts required by a, will be referred to as 
attribute terms. (Note that attribute terms are not terms). Attributes with the range {true, false} are referred to as 
Boolean attributes or relations. We assume that the number of terms and attributes over E is finite. Note that, since 
our signature is sorted, this does not preclude the use of function symbols. The example in Section 153] illustrates 
such a use. 

Atomic statements are of the form a(t) — to, where to is a term, I is a vector of terms, and a is an attribute (we 
assume that t and I are of the sorts required by a). An atomic statement, p, or its negation, -^p is referred to as a 
literal (or S-literal, if E needs to be emphasized); literals p and -ip are called contrary; by I we denote the literal 
contrary to I; expressions / and not I where / is a literal and not is the default negation of Answer Set Prolog 
are called extended literals. Literals of the form a(t) = true, a(t) = false, and -'(a(I) — to) are often written as 
a(t), -^a{t), and a(t) ^ to respectively. If p is a unary relation and X is a variable then an expression of the form 
{X : p{X)} will be called a set-term. Occurrences of X in such an expression are referred to as bound. 

Terms and literals are normally denoted by (possibly indexed) letters t and / respectively. The letters c and a, 
possibly with indices, are used as generic names for sorts and attributes. Other lower case letters denote objects. 
Capital letters normally stand for variables. 

Similar to Answer Set Prolog, a P-log statement containing unbound variables is considered a shorthand for the 
set of its ground instances, where a ground instance is obtained by replacing unbound occurrences of variables 
with properly sorted ground terms. Sorts in a program are indicated by the declarations of attributes (see below). 
In defining semantics of our language we limit our attention to finite programs with no unbound occurrences of 
variables. We sometimes refer to programs without unbound occurrences of variables as ground. 

(ii) Declaration: The declaration of a P-log program is a collection of definitions of sorts and sort declarations for 
attributes. 

A sort c can be defined by explicitly listing its elements, 

C = {Xl, . . . ,Xn} ■ (1) 

or by a logic program T with a unique answer set A. In the latter case x G c iff c{x) G A. 
The domain and range of an attribute a are given by a statement of the form: 

a : Ci X . . . X c„ — > Co • (2) 

For attributes without parameters we simply write a : cq. 
The following example will be used throughout this section. 



Example 1 
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[Dice Example: program component Di] 

Consider a domain containing two dice owned by Mike and John respectively. Each of the dice will be rolled once. 
A P-log program Hg modeling the domain will have a signature E containing the names of the two dice, di and 
d2, an attribute roll mapping each die to the value it indicates when thrown, which is an integer from 1 to 6, an 
attribute owner mapping each die to a person, relation even{D), where D ranges over dice, and "imported" or 
"predefined" arithmetic functions + and mod. The corresponding declarations, Di, will be as follows: 

dice ~ {di, ^2}' 
score — {1, 2, 3, 4, 5, 6}- 
person = {mike,john}- 
roll : dice — > score- 
owner : dice — > person- 
even : dice — > Boolean- □ 

(iii) Regular part: The regular part of a P-log program consists of a collection of rules of Answer Set Prolog 
(without disjunction) formed using literals of E. 

Example 2 

[Dice Example (continued): program component D2] 

For instance, the regular part D2 of program Ho may contain the following rules: 



owner (di) = mike- 
owner{d2) = john- 

even(D) ^ roll{D) = Y,Y mod 2 = 0- 
^even(D) ^ not even{D)- 

Here D and Y range over dice and score respectively. 



□ 



(iv) Random Selection: This section contains rules describing possible values of random attributes. More precisely 
a random selection is a rule of the form 

[ r ] random{a(t) : {X : ^ B- (3) 

where r is a term used to name the rule and i? is a collection of extended literals of E. The name [ r ] is optional 
and can be omitted if the program contains exactly one random selection for a{t). Sometimes we refer to r as 
an experiment. Statement (O says that if B holds, the value of a(t) is selected at random from the set {X : 
p{X)} n range{a) by experiment r, unless this value is fixed by a deliberate action. If B in^ is empty we simply 
write 

[ r ] random{a(i) : {X : piX)}) ■ (4) 
If {X : p{X)} is equal to the range{a) then rule ^ may be written as 

[ r ] random{a{t)) <— 5 • (5) 

Sometimes we refer to the attribute term a(t) as random and to {X : p{X)} n range{a) as the dynamic range of 
a(t) via rule r. We also say that a literal a(t) = y occurs in the head of (O for every y E range{a), and that any 
ground instance of p{X) and literals occurring in B occur in the body of (O. 
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Example 3 

[Dice Example (continued)] 

The fact that values of attribute roll : dice score are random is expressed by the statement 

[ r{D) ] random{roll{D)). □ 

(v) Probabilistic Information: Information about probabilities of random attributes taking particular values is 
given by probability atoms (or simply pr-atoms) which have the form: 

prr{a(i) = y \c B) ^ V ■ (6) 

where v e [0, 1], i? is a collections of extended literals, pr is a special symbol not belonging to E, r is the name of 
a random selection rule for a(t), and prr{a(t) — y \c B) — v says that if the value of a{t) is fixed by experiment 
r, and B holds, then the probability that r causes a(t) — y is v. (Note that here we use 'cause' in the sense that 
B is an immediate or proximate cause of a(t) ~ y, as opposed to an indirect cause.) If W is a possible world 
of a program containing ^ and W satisfies both B and the body of rule r, then we will refer to v as the causal 
probability of the atom a(t) — y in W. 

We say that a literal a(t) — y occurs in the head of dU, and that literals occurring in B occur in the body of 
If B is empty we simply write 

prr{a(t) ^ y) = V (7) 

If the program contains exactly one rule generating values of a (I) = y the index r may be omitted. 

Example 4 

[Dice Example (continued): program component D3] 

For instance, the dice domain may include consisting of the random declaration of roll{D) given in Example|3] 
and the following probability atoms: 

pr{roll{D) ~ Y \ c owner {D) = john) — 1/6- 
pr{roll{D) = 6 |c owner {D) = mike) — 1/4. 
pr{roll{D) ^ Y \ c Y j^6, owner{D) = mike) = 3/20. 

The above probability atoms convey that the die owned by John is fair, while the die owned by Mike is biased to 
roll 6 at a probability of -25. □ 

(vi) Observations and actions: Observations and actions are statements of the respective forms 

obs{l) ■ do{a{t) = y))- 

where / is a literal. Observations are used to record the outcomes of random events, i.e., random attributes, and 
attributes dependent on them. The dice domain may, for instance, contain {obs{roll{di) = 4)} recording the 
outcome of rolling die di. The statement do{a(t) = y) indicates that a(t) = y is made true as a result of a 
deliberate (non-random) action. For instance, {do{roll{di) = 4)} may indicate that di was simply put on the table 
in the described position. Similarly, we may have obs{even{di)). Here, even though even{di) is not a random 
attribute, it is dependent on the random attribute roll{di). If 5 is a collection of literals obs{B) denotes the set 
{obs{l) \ l e B}. Similarly for do. 

The precise meaning of do and obs is captured by axioms (l9l-[T3]) in the next section and discussed in Example 
[TSl and in connection with Simpson's Paradox in section l572l More discussion of the difference between actions 
and observations in the context of probabilistic reasoning can be found in dPearl 20001 1. 
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Note that limiting observable formulas to literals is not essential. It is caused by the syntactic restriction of Answer 
Set Prolog which prohibits the use of arbitrary formulas. The restriction could be lifted if instead of Answer Set 
Prolog we were to consider, say, its dialect from (ILifschitz et al. 1999] l. For the sake of simplicity we decided to 
stay with the original definition of Answer Set Prolog. 

A P-log program 11 can be viewed as consisting of two parts. The logical part, which is formed by declarations, 
regular rules, random selections, actions and observations, defines possible worlds of 11. The probabilistic part 
consisting of probability atoms defines a measure over the possible worlds, and hence defines the probabilities of 
formulas. (If no probabilistic information on the number of possible values of a random attribute is available we 
assume that all these values are equally probable). 



3 Semantics of P-log 

The semantics of a ground P-log program 11 is given by a collection of the possible sets of beliefs of a rational 
agent associated with 11, together with their probabilities. We refer to these sets as possible worlds of 11. We will 
define the semantics in two stages. First we will define a mapping of the logical part of 11 into its Answer Set 
Prolog counterpart, t{II). The answer sets of r (11) will play the role of possible worlds of 11. Next we will use the 
probabilistic part of 11 to define a measure over the possible worlds, and the probabilities of formulas. 



3.1 Defining possible worlds: 
The logical part of a P-log program 11 is translated into an Answer Set Prolog program r(n) in the following way. 

1. Sort declarations: For every sort declaration c = {xi, . . . ,a;„} of 11, T(n) contains c{xi), . .. , c(a;„). 
For all sorts that are defined using an Answer Set Prolog program T in 11, T(n) contains T. 

2. Regular part: 

In what follows (possibly indexed) variables Y are free variables. A rule containing these variables will be 
viewed as shorthand for a collection of its ground instances with respect to the appropriate typing. 

(a) For each rule r in the regular part of 11, r(n) contains the rule obtained by replacing each occurrence 
of an atom a{t) = y in rhy a(t, y). 

(b) For each attribute term a(t), T(n) contains the rule: 

-a(I, Fi) ^ a{t, Y2), Yi^Y2- (8) 

which guarantees that in each answer set a(t) has at most one value. 

3. Random selections: 

(a) For an attribute a, we have the rule: 

intervene{a(i)) ^ do{a(i, Y)) ■ (9) 

Intuitively, intervene{a(t)) means that the value of a(t) is fixed by a deliberate action. Semantically, 
a (I) will not be considered random in possible worlds which satisfy intervene{a(i)). 

(b) Each random selection rule of the form 

[ r ] random{a{t) : {Z : p{Z)}) ^ B- 
with range{a) = {yi, yk} is translated to the following rules in Answer Set Prolog 

a(t, yi) or ... or a{t, yk) ^ not intervene{a{t)) ■ (10) 

^ Our P-log implementation uses an equivalent rule l{a{t, Z) : co{Z) : p{Z)}l <— B, not intervene{a{t)) from the input language of 
Smodels. 
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If the dynamic range of a in the selection rule is not equal to its static range, i.e. expression {Z : p{Z)} 
is not omitted, then we also add the rule 




Rule JTOl l selects the value of a{t) from its range while rule (fTTT i ensures that the selected value satisfies 
P- 



4, 
5, 



T(n) contains actions and observations of H. 



For each S-literal /, T(n) contains the rule: 




6 



For each atom a{t) = y, T(n) contains the rule: 




The rule ( fT2] l guarantees that no possible world of the program fails to satisfy observation /. The rule ( fTSl ) 
makes sure the atoms that are made true by the action are indeed true. 

This completes our definition of T(n). 

Before we proceed with some additional definitions let us comment on the difference between rules [12] and [13] 
Since the P-log programs T U ohs{l) and T U not 1} have possible worlds which are identical except for pos- 
sible occurrences of obs{l), the new observation simply eliminates some of the possible worlds of T. This reflects 
understanding of observations in classical probability theory. In contrast, due to the possible non-monotonicity of 
the regular part of T, possible worlds of T" U do{l) can be substantially different from those of T (as opposed to 
merely fewer in number); as we will illustrate in Section 15.31 

Definition 1 
[Possible worlds] 

An answer set of T(n) is called a possible world of H. □ 

The set of all possible worlds of 11 will be denoted by $7(11). When 11 is clear from context we will simply write 
il. Note that due to our restriction on the signature of P-log programs possible worlds of IT are always finite. 



[Dice example continued: P-log program Ti] 

Let Ti be a P-log program consisting of Di, D2 and described in Examples [T] [2] [3] and ]4] The Answer Set 
Prolog counterpart t{Ti) of Ti will consist of the following rules: 

dice{di). dice{d2). score{l). score{2). 
score(3). score{A). score{5). score{6). 
person{mike). person{john). 
owner [di, mike), owner {dz^john). 

even{D) ^ roll{D, Y), Y mod 2 = 0. 

-ieven{D) <— not even{D). 

intervene{roU{D)) <— do{roU{D, Y)). 

roU{D, 1) or ... or roll{D, 6) ^ B, not intervene{roll{D)) . 



Example 5 



^roll{D, Yi) ^ roll{D, Fa), ^ ^ Y2. 
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owner (D, Pi) <— owner { D, P2), Pi P2- 
-neven{D, Si) ^ even{D, B2), Bi ^ Sj. 
^ obs{roll{D, Y)),not roll{D, Y). 
^ obs{^roU{D, Y)),not -^roll{D, Y). 
roll{D, Y)) ^ do{roU{D, Y)). 

The translation also contains similar obs and do axioms for other attributes which have been omitted here. 

The variables D, P, S's, and Y's range over dice, person, boolean, and score respectively. (In the input language 
of Lparse used by Smodels dNiemela and Simons 19971 1 and several other answer set solving systems this typing 
can be expressed by the statement 

^domain dice{D) , person{P) , score{ Y). 

Alternatively c{X) can be added to the body of every rule containing variable X with domain c. In the rest of the 
paper we will ignore these details and simply use Answer Set Prolog with the typed variables as needed.) 

It is easy to check that t{Ti) has 36 answer sets which are possible worlds of P-log program Ti . Each such world 
contains a possible outcome of the throws of the dice, e.g. roll{di, 6), roll{d2, 3). □ 



3.2 Assigning measures of probability: 

There are certain reasonableness criteria which we would like our programs to satisfy. These are normally easy to 
check for P-log programs. However, the conditions are described using quantification over possible worlds, and so 
cannot be axiomatized in Answer Set Prolog. We will state them as meta-level conditions, as follows (from this 
point forward we will limit our attention to programs satisfying these criteria): 

Condition 1 

[Unique selection rule] 
If rules 

[ ri ] random{a(i) : {Y : Pi(Y)}) ^ Br 

[ r2 ] random{a(f,) : {Y : P2{Y)}) ^ Ss- 
belong to 11 then no possible world of 11 satisfies both Bx and i?2- ^ 

The above condition follows from the intuitive reading of random selection rules. In particular, there cannot be two 
different random experiments each of which determines the value of the same attribute. 

Condition 2 

[Unique probability assignment] 

If n contains a random selection rule 

[ r ] random{a(t) : {Y : p{ Y)}) ^ B- 

along with two different probability atoms 

prr{a(t) |c Bi) = vi andprr(a(I) \c B2) = V2- 

then no possible world of 11 satisfies B, Bi, and i?2- ^ 



12 



C. Baral, M. Gelfond and N. Rushton 



The justification of Condition 2 is as follows: If the conditions Bi and B2 can possibly both hold, and we do not 
have vi — V2, then the intuitive readings of the two pr-atoms are contradictory. On the other hand if vi = V2, the 
same information is represented in multiple locations in the program which is bad for maintenance and extension 
of the program. 

Note that we can still represent situations where the value of an attribute is determined by multiple possible causes, 
as long as the attribute is not explicitly random. To illustrate this point let us consider a simple example from 
jVennekens et al. 2006l l. 

Example 6 

[Multiple Causes: Russian roulette with two guns] 

Consider a game of Russian roulette with two six-chamber guns. Each of the guns is loaded with a single bullet. 
What is the probability of the player dying if he fires both guns? 

Note that in this example pulling the trigger of the first gun and pulling the trigger of the second gun are two 
independent causes of the player's death. That is, the mechanisms of death from each of the two guns are separate 
and do not influence each other. 

The logical part of the story can be encoded by the following P-log program lig-. 
gun = {1. 2}. 

pulLtrigger : gun boolean. % pulLtrigger{G) says that the player pulls the trigger of gun G. 

fatal : gun boolean. % fatal{G) says that the bullet from gun G is sufficient to kill the player. 

is-dead : boolean. % is_dead says that the player is dead. 

[r(G)] : random(fatal{G)) ^ pull_trigger{G). 

is -dead <— fatal{G). 

-^is_dead <— not is-dead. 

pullJ,rigger{G). 

Here the value of the random attribute fatal{l), which stands for "Gun 1 causes a wound sufficient to kill the 
player" is generated at random by rule r(l). Similarly for fatal{2). The attribute is-dead, which stands for the 
death of the player, is described in terms of fatal (G) and hence is not explicitly random. To define the probability 
of fatal{G) we will assume that when the cylinder of each gun is spun, each of the six chambers is equally likely 
to fall under the hammer. Thus, 

prr(i) {fatal {!)) = 1/6. 
pr,(^2){fatal(2)) = 1/6. 

Intuitively the probability of the player's death will be 11/36. At the end of this section we will learn how to 
compute this probability from the program. 

Suppose now that due to some mechanical defect the probability of the first gun firing its bullet (and therefore 
killing the player) is not 1/6 but, say, 11/60. Then the probability atoms above will be replaced by 

prrii){fatalil)) = 11/60. 
prr{2){fatal{2)) = 1/6. 

The probability of the player's death defined by the new program will be • 32. Obviously, both programs satisfy 
Conditions [T] and |2] above. 

Note however that the somewhat similar program 
gun = {1,2}. 

pull-trigger : gun — > boolean, 
is-dead : boolean. 
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[r(C?)] : random{is_dead) ^ pull-trigger{G). 
pull_trigger{G). 

does not satisfies Condition [T] and hence will not be allowed in P-log. □ 
The next example presents a slightly different version of reasoning with multiple causes. 
Example 7 

[Multiple Causes: The casino story] 

A roulette wheel has 38 slots, two of which are green. Normally, the ball falls into one of these slots at random. 
However, the game operator and the casino owner each have buttons they can press which "rig" the wheel so 
that the ball falls into slot 0, which is green, with probability 1/2, while the remaining slots are all equally likely. 
The game is rigged in the same way no matter which button is pressed, or if both are pressed. In this example, 
the rigging of the game can be viewed as having two causes. Suppose in this particular game both buttons were 
pressed. What is the probability of the ball falling into slot 0? 

The story can be represented in P-log as follows: 

slot — {zero, double-zero, 1 ■ -36}. 

button = {1,2}. 

pressed : button — > boolean. 

rigged : boolean. 

falls-in : slot. 

[r] : random{Jalls_in). 

rigged <— pressed{B). 

-^rigged <— not rigged. 

pressed{B). 

prr{falls-in — zero\crigged) = 1/2. 

Intuitively, the probability of the ball falling into slot zero is 1/2. The same result will be obtained by our formal 
semantics. Note that the program obviously satisfies Conditions [T] and |2l However the following similar program 
violates Condition|2] 

slot — {zero, double-zero, 1 • -36}. 

button = {1,2}. 

pressed : button — > boolean. 

falls-in : slot. 

[r] : random{falls _in) . 

pressed{B). 

prr{falls-in — zero\cpressed(B)) = 1/2. 

Condition |2] is violated here because two separate pr-atoms each assign probability to the literal falls_in = zero. 
Some other probabilistic logic languages allow this, employing various systems of "combination rules" to compute 
the overall probabilities of literals whose probability values are multiply assigned. The study of combination rules 
is quite complex, and so we avoid it here for simplicity. □ 

Condition 3 

[No probabilities assigned outside of dynamic range] 
If n contains a random selection rule 



[r] random{a{t) : {Y : p{Y)}) ^ Br 
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along with probability atom 

prr{a(t) = y \c B2) t^- 

then no possible world of 11 satisfies Bi and B2 and not intervene{a{t)) but fails to satisfy p{y). □ 

The condition ensures that probabilities are only assigned to logically possible outcomes of random selections. It 
immediately follows from the intuitive reading of statements (O and 

To better understand the intuition behind our definition of probabilistic measure it may be useful to consider an 
intelligent agent in the process of constructing his possible worlds. Suppose he has akeady constructed a part V 
of a (not yet completely constructed) possible world W, and suppose that V satisfies the precondition of some 
random selection rule r. The agent can continue his construction by considering a random experiment associated 
with r. If ?/ is a possible outcome of this experiment then the agent may continue his construction by adding the 
atom a(i) — y to V. To define the probabilistic measure /i of the possible world W under construction, we need 
to know the likelihood of y being the outcome of r, which we will call the causal probability of the atom a(i) = y 
in FK. This information can be obtained from a pr-atom prr(a(I) = y) = u of our program or computed using the 
principle of indifference. In the latter case we need to consider the collection R of possible outcomes of experiment 
r. For example if y <E R, there is no probability atom assigning probability to outcomes of R, and \R\ = n, then 
the causal probability of a(t = y) in W will be 1/n. 

Let V be the causal probability of a (I) = y. The atom a(t) = y may be dependent, in the usual probabilistic 
sense, with other atoms akeady present in the construction. However v is not read as the probability of a(t) — y, 
but the probability that, given what the agent knows about the possible world at this point in the construction, the 
experiment determining the value of a(i) will have a certain result. Our assumption is that these experiments are 
independent, and hence it makes sense that v will have a multiplicative effect on the probability of the possible 
world under construction. (This approach should be familiar to those accustomed to working with Bayesian nets.) 
This intuition will be captured by the following definitions. 

Definition 2 
[Possible outcomes] 

Let 1/F be a consistent set of literals of S, 11 be a P-log program, a be an attribute, and y belong to the range of a. 
We say that the atom a(i) — y is possible in W with respect to 11 if 11 contains a random selection rule r for a(i), 
where if r is of the form Q then p{y) e W and W satisfies B, and if r is of the form ^ then W satisfies B. We 
also say that y is a possible outcome of a (I) in W with respect to 11 via rule r, and that r is a generating rule for 
the atom a(t) ~ y. □ 

Recall that, based on our convention, if the range of a is boolean then we can just say that a(I) and -ia(I) are 
possible in W. (Note that by Condition[Tl if is a possible world of 11 then each atom possible in W has exactly 
one generating rule.) 

Note that, as discussed above, there is some subtlety here because we are describing a{t) ^ y as possible, though 
not necessarily true, with respect to a particular set of literals and program 11. 

For every W £ f2(n) and every atom a (I) = y possible in W we will define the corresponding causal probability 
P{ W, a(t) — y). Whenever possible, the probability of an atom a(t) — y will be directly assigned by pr-atoms 
of the program and denoted by PA{ W, a(t) = y).To define probabilities of the remaining atoms we assume that 
by default, all values of a given attribute which are not assigned a probability are equally likely. Their probabilities 
will be denoted by PD{W, a{t) — y). {PA stands for assigned probability wd PD stands for default probability). 

For each atom a (I) = y possible in W: 
1 . Assigned probabihty: 
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If n contains prr(a(t) = y \c B) = -y where r is the generating rule of a{t) = y, B C_ M^, and does not 
contain intervene{a(t)), then 

PA{W,a(t) ^y)^v 

2. Default probability: 

For any set S, let \S\ denote the cardinality of S. Let A^ij^{W) = {y \ PA{ W, a(I) = y) is defined}, and 
a (I) = y be possible in W such that y ^ A^fj-^{W). Then let 

"a(I)(W^)= E PMW,aCt) = y) 
Pa(t)iW) = \{y : a{t) = ?/ is possible in W and y ^ A^(j-^{W)}\ 

3. Finally, the causal probability P{ W, a{t) — y) of a{t) = y in W is, defined by: 



P{W,a{t)^y) 



f\ ^„\^S PMW,a{t) = y) if ye A,(j^{W) 
PD{W,a{t)^y) otherwise- 



Example 8 

[Dice example continued: P-log program Ti] 

Recall the P-log program Ti from Example |5] The program contains the following probabilistic information: 

pr{roll{di) — i\c owner{di) = mike) — 3/20, for each i such that 1 < i < 5- 
pr{roll{di) — &\c owner (di) = mike) = 1/4- 

pr{roll{d2) = i |c owner{d2) — john) — 1/6, for each i such that 1 < i < 6- 
We now consider a possible world 

W = {owner{di, mike), owner {d2, john), roU{di, 6), roll{d2, 3), . . .} 
of Ti and compute P{W , roll{di) ~ j) for every die and every possible score j. 

According to the above definition, PA{ W, roll{di) ~ j) and P{W, roll{di) = j) are defined for every random 
atom (i.e. atom formed by a random attribute) roll{di) = j in W as follows: 

P{W, roll{di) = i) = PA{W, roll{di) = i) = 3/20, for each i such that 1 < i < 5- 
P{ W, roll{di) = 6) = PA{ W, roU{di) = 6) = 1/4- 

pIw, roll{d2) ^i) = PA{W, roll{d2) ^ i) = 1/6, for each i such that 1 < i < 6- □ 
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[Dice example continued: P-log program Ti i] 

In the previous example all random atoms of W were assigned probabilities. Let us now consider what will happen 
if explicit probabilistic information is omitted. Let D^.i be obtained from D3 by removing all probability atoms 
except 

pr(roll(D) = 6 |c owner{D) — mike) — 1/4. 

Let Ti l be the P-log program consisting of Di, D2 and D3.1 and let W be as in the previous example. Only the 
atom roll{di) = 6 will be given an assigned probability: 



P[W,roll{di) = ^) = PA[W,roll{di) = 6) = 1/4. 
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The remaining atoms receive the expected default probabilities: 

P{W,roll{di) = i) = PD{W,roll{di) = i) = 3/20, for each i such that I < i < 5- 

P{W, roll{d2) = i) = PD{W, roll{d2) = «) = 1/6, for each i such that 1 < z < 6- □ 

Now we are ready to define the measure, /in, induced by the P-log program 11. 

Definition 3 
[Measure] 

1. Let be apossible world of n. The ujiJiormaiized probability, /in( W^), of apossible world W induced by 
nis 

m{W)^ H PiW,ait)^y) 

where the product is taken over atoms for which P{ W, a(i) — y) is defined. 

2. Suppose n is a P-log program having at least one possible world with nonzero unnormalized probability. 
The measure, /Ltn( W), of a possible world W induced by H is the unnormalized probability of W divided 
by the sum of the unnormalized probabilities of all possible worlds of 11, i.e.. 

When the program 11 is clear from the context we may simply write fi and /i instead of fiu and /in respectively. □ 

The unnormalized measure of a possible world W corresponds, from the standpoint of classical probability, to the 
unconditional probability of W. Each random atom a(t) = y in W is thought of as the outcome of a random 
experiment that takes place in the construction of W, and P{W, a(t) — y) is the probability of that experiment 
having the result a(I) ^ y m W . The multiplication in the definition of unnormaUzed measure is justified by an 
assumption that all experiments performed in the construction of W are independent. This is subtle because the 
experiments themselves do not show up in W — only their results do, and the results may not be independent!! 

Example 10 

[Dice example continued: Ti and Ti-i] 

The measures of the possible worlds of Example|9]are given by 

li{{roll{di,Q),roll{d2,y), . . .]) = 1/24, for 1 <y <Q, and 
fi{{roU{di, u), roll{d2, y), ...}) = 1/40, for 1 < m < 5 and 1 < ?/ < 6. 

where only random atoms of each possible world are shown. □ 
Now we are ready for our main definition. 



For instance, in the upcoming Example llSI random attributes arsenic and death respectively reflect whether or not a given rat eats arsenic, 
and whether or not it dies. In that example, death and arsenic are clearly dependent. However, we assume that the factors which determine 
whether a poisoning will lead to death (such as the rat's constitution, and the strength of the poison) are independent of the factors which 
determine whether poisoning occurred in the first place. 
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Definition 4 
[Probability] 

Suppose n is a P-log program having at least one possible world with nonzero unnormalized probability. The 
probability, Pu{E), of a set E of possible worlds of program 11 is the sum of the measures of the possible worlds 
from E, i.e. 

Pu{E)^ ^ m{w)- 

□ 

When n is clear from the context we may simply write P instead of _Pn- 

The function Fn is not always defined, since not every syntactically correct P-log program satisfies the condition 
of having at least one possible world with nonzero unnormalized measure. Consider for instance a program 11 
consisting of facts 
p{a)- 

The program has no answer sets at all, and hence here Fn is not defined. The following proposition, however, 
says that when Fn Js defined, it satisfies the Kolmogorov axioms of probability. This justifies our use of the term 
"probability" for the function Fn- The proposition follows straightforwardly from the definition. 

Proposition 1 
[Kolmogorov Axioms] 

For a P-log program 11 for which the function Fn is defined we have 

1. For any set E of possible worlds of 11, Pn{E) > 0. 

2. If il is the set of all possible worlds of 11 then Pn{^) — 1- 

3. For any disjoint subsets Ei and E2 of possible worlds of 11, Fn(-E'i U E2) = Fn(-E'i) + Fn(i?2)- ^ 

In logic -based probability theory a set E of possible worlds is often represented by a propositional formula F such 
that W E E iff W is a model of F. In this case the probabiUty function may be defined on propositions as 

F(F) =de/ Pi{W : Wis a model of F}). 

The value of F(F) is interpreted as the degree of reasoner's belief in F. A similar idea can be used in our frame- 
work. But since the connectives of Answer Set Prolog are different from those of Propositional Logic the notion 
of propositional formula will be replaced by that of formula of Answer Set Prolog (ASP formula). In this paper we 
limit our discussion to relatively simple class of ASP formulas which is sufficient for our purpose. 

Definition 5 

[ASP Formulas (syntax)] 
For any signature E 

• An extended literal of E is an ASP formula. 

• if A and B are ASP formulas then {A A B) and {A or B) ai-e ASP formulas. □ 

For example, {{p A not q A -r) or {not r)) is an ASP formula but {not {not p)) is not. More general definition 
of ASP formulas which allows the use of negations -1 and not in front of arbitrary formulas can be found in 
(ILifschitz et al. 2001l l. 



Now we define the truth ( h ^) and falsity ( H ^) of an ASP formula A with respect to a possible world W: 
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1. For any S-literal I, l\f I e W\ W ^ lif I e W . 

2. For any extended S-literal not I, W ^ not lif I W; W -\ not lif I e W. 

3. Wh- {Ai A A2) if Ai and W h A2; W ^ (Ai A A2) if W ^Aioi W ^ A2. 

\. WV- (^1 or A2) ifW^A^ or W ^ A2\W ^ {A^ or A2) if W ^ Ai and W ^ A2. □ 

An ASP formula A which is neither true nor false in W is undefined in W. This introduces some subtlety. The 
axioms of modern mathematical probability are viewed as axioms about measures on sets of possible worlds, 
and as such are satisfied by P-log probability measures. However, since we are using a three-valued logic, some 
classical consequences of the axioms for the probabilities of formulae fail to hold. Thus, all theorems of classical 
probability theory can be applied in the context of P-log; but we must be careful how we interpret set operations in 
terms of formulae. For example, note that formula ( / or not I) is true in every possible world W. However formula 
{p or -^p) is undefined in any possible world containing neither p nor -^p. Thus if P is a P-log probability measure, 
we will always have P{not I) = 1 — P{1), but not necessarily P{^1) = 1 — P{1)- 

Consider for instance an ASP program Pi from the introduction. If we expand Pi by the appropriate declarations 
we obtain a program Hi of P-log. It's only possible world is Wo = {p{a), -^p{b), q{c)}. Since neither p nor q are 
random, its measure, fi{ Wg) is 1 (since the empty product is 1). However, since the truth value of p{c) or -^p{c) 
in Wq is undefined, Pjj^ {p{c) or -^p{c)) = 0. This is not surprising since Wq represents a possible set of beliefs 
of the agent associated with Hi in which p{c) is simply ignored. (Note that the probability of formula q{c) which 
expresses this fact is properly equal to 1). 

Let us now look at program 112 obtained from Hi by declaring p to be a random attribute. This time p{c) is not 
ignored. Instead the agent considers two possibilities and constructs two complet^ possible worlds: 

Wi = {p{a),^p{b),p{c),^q{c)} and 
W2 = {p(a),-p(6),^p(c),-g(c)}. 
Obviously (l'(c) or ^p{c)) — 1. 

It is easy to check that if all possible worlds of a P-log program 11 are complete then Pu{l or ^l) = 1. This is the 
case for instance when 11 contains no regular part, or when the regular part of 11 consists of definitions of relations 
pi,. . . ,Pn (where a definition of a relation p is a collection of rules which determines the truth value of atoms 
built from p to be true or false in all possible worlds). 

Now the definition of probability can be expanded to ASP formulas. 
Definition 7 

[Probability of Formulas] 

The probability with respect to program 11 of a formula A, Pyi{A), is the sum of the measures of the possible 
worlds of n in which A is true, i.e. 

Pn{A)^ ^ miwy 

WhA 

□ 

As usual when convenient we omit 11 and simply write P instead of Pu- 

A possible world W of program EI is called complete if for any ground atom a from the signature of If, a € W or —^a £ W. 



Probabilistic reasoning with answer sets 



19 



Example 11 

[Dice example continued] 

Let Ti be the program from Example |5] Then, using the measures computed in Example [TO] and the definition of 
probability we have, say 

PT,{roll{di) = 6) = 6 * (1/24) = 1/4. 

PTAroll{di) = 6 A even{d2)) = 3 * (1/24) = 1/8. □ 



Example 12 

[Causal probability equal to 1] 

Consider the P-log program IIq consisting of: 

a : boolean, 
random a. 
pr(a) = 1- 

The translation of its logical part, T(no), will consist of the following: 

intervene{a) ^ do{a). 

intervene{a) ^ (io(-ia). 

a or -ifl ^ not intervene{a). 

^ obs{a), not a. 

^ o6s(-ia), not -la. 

a <— do{a). 

-ifl <— do{-'a). 

T(no) has two answer sets Wi ~ {a, . . .} and W2 = {^a, . . .}. The probabilistic part of IIq will lead to the 
following probability assignments. 

P{Wi,a) = 1. 
P{Wi,^a) = 0. 
PiW2,a) = 1. 
PiW2,^a)=0. 

m„{W2)^o. 
mo{W2) = o. 

This gives us Fno ('^) = 1- ^ 
Example 13 

[Guns example continued] 

Let Hg be the P-log program from Example |6] It is not difficult to check that the program has four possi- 
ble worlds. All four contain {gun{l), gun{2), pulLtrigger{l), pulLtrigger{2)}. Suppose now that Wi contains 
{fatal{l),^fatal{2)}, W2 contains {^fatal{l),fatal{2)}, W3 contains {fatal{l),fatal{2)}, and W4 contains 
{-i/atoZ(l), -^fatal{2)}. The first three worlds contain is-dead, the last one contains -^is.dead. Then 
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^in^iWx) = 1/6*5/6 5/36. 
m,(W^2) = 5/6* 1/6 = 5/36. 
m,(W^3) = 1/6*1/6 = 1/36. 
( W4) = 5/6 * 5/6 = 25/36. 

and hence 

Pn^{is_dead) = 11/36. □ 

As expected, this is exactly the intuitive answer from Example |6] A similar argument can be used to compute 
probability of rigged from Example]?] 

Even if Pn satisfies the Koknogorov axioms it may still contain questionable probabilistic information. For in- 
stance a program containing statements pr{p) = 1 and pr{-^p) = 1 does not seem to have a clear intuitive 
meaning. The next definition is meant to capture the class of programs which are logically and probabilistically 
coherent. 

Definition 8 
[Program Coherency] 

Let n be a P-log program and H' be obtained from 11 by removing all observations and actions. 11 is said to be 
consistent if 11 has at least one possible world. 

We will say that a consistent program 11 is coherent if 

• Pn is defined. 

• For every selection rule r with the premise K and every probability atom prr{a{t) = t/ | ^ S) = t; of 11, if 

Pn'{B U K) is not equal to then Pn'uobs{B)uobs{K){a{t) = y) = v. □ 

Coherency intuitively says that causal probabilities entail corresponding conditional probabilities. We now give 
two examples of programs whose probability functions are defined, but which are not coherent. 

Example 14 

Consider the programs II5: 

a : boolean, 
random a. 
a- 

pr{a) = 1/2- 

and Ilg: 

a : {0,1,2}. 
random a. 

pr{a = 0) = pr{a = 1) = pr{a = 2) = 1/2- 

Neither program is coherent. has one possible world W — {a}. We have /ins ( W') — I'-n^iW) — 1, and 
Pu^{a) = 1. Since pr{a) — 1/2, Ha violates condition (2) of coherency. 

Ilg has three possible worlds, {a — 0}, {a = 1}, and {a — 2} each with unnormalized probability 1/2. Hence 
Pneio- = 0) = 1/3, which is different from pr{a = 0) which is 1/2; thus making Ilg incoherent. □ 

The following two propositions give conditions on the probability atoms of a P-log program which are necessary 
for its coherency. 
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Proposition 2 

Let n be a coherent P-log program without any observations or actions, and a(t) be an attribute term from the 
signature of 11. Suppose that 11 contains a selection rule 

[r] random{a(t) : {X : p{X)}) ^ Bi- 

and there is a subset c = {j/i, . . . , j/„} of the range of a (I) such that for every possible world of 11 satisfying 
Bi, we have {Y : W h p{Y)} = {yi,. ..,?/«}■ Suppose also that for some fixed B2, n contains probability 
atoms of the form 

prr{a(t) ^ yi\c B2) ^ Pt- 

for < i < n. Then 

n 

Pn{Bi/\B2) = or ^P^ = '^ 

□ 

Proof: Let n = n U obs{Bi) U obs{B2) and let PuiBi A B2) ^ 0. From this, together with rule [T2l from the 
definition of the mapping r from section lTTl we have that 11 has a possible world with non-zero probability. Hence 
by Proposition[T] Pjj satisfies the Kolmogorov Axioms. By Condition 2 of coherency, we have Pf^{a(t) = yi) = 
Pi, for all 1 < i < n. By rule [12] of the definition of r we have that every possible world of 11 satisfies Bi. 
This, together with rules [8] (TO] and [TT] from the same definition implies that every possible world of 11 contains 
exactly one literal of the form a{t) = y where y & c. Since Pf^ satisfies the Kolmogorov axioms we have that if 
{ f 1 , . . . , i^n, } is a set of literals exactly one of which is true in every possible world of II then 

n 

This implies that 

n n 
1=1 i=l 

The proof of the following is similar: 
Proposition 3 

Let n be a coherent P-log program without any observations or actions, and a(i) be an attribute term from the 
signature of 11. Suppose that 11 contains a selection rule 

[r] random{a{t) : p) <— inl- 
and there is a subset c = {2/1, . . . , y„} of the range of a (I) such that for every possible world of 11 satisfying 
Bi, we have {F : h p{Y)} = {yi, . . . , j/„}. Suppose also that for some fixed B2, 11 contains probability 
atoms of the form 

prr{a(t) ^ Ui \^ B2) ^ pv 

for some 1 < i < n. Then 

n 

Pn{Bi/\B2) = or ^P^<'^ 

□ 
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In this section we address the problem of belief updating — the ability of an agent to change degrees of belief 
defined by his current knowledge base. If T is a P-log program and f/ is a collection of statements such that TU U 
is coherent we call U an update of T. Intuitively U is viewed as new information which can be added to an existent 
knowledge base, T. Explicit representation of the agent's beliefs allows for a natural treatment of belief updates 
in P-log. The reasoner should simply add the new knowledge U to T and check that the result is coherent. If it is 
then the new degrees of the reasoner's beliefs are given by the function Ptuu- As mentioned before we plan to 
expand our work on P-log with allowing its regular part be a program in CR-Prolog ( Balduccini and Gelfond 2003] ) 
which has a much more Uberal notion of consistency than Answer Set Prolog. The resulting language will allow a 
substantially larger set of possible updates. 

In what follows we compare and contrast different types of updates and investigate their relationship with the 
updating mechanisms of more traditional Bayesian approaches. 

4.1 P-log Updates and Conditional Probability 

In Bayesian probability theory the notion of conditional probability is used as the primary mechanism for updating 
beliefs in light of new information. If P is a probability measure (induced by a P-log program or otherwise), then 
the conditional probability P{A\B) is defined as P{A A B)/P{B), provided P{B) is not 0. Intuitively, P{A\B) 
is understood as the probability of a formula A with respect to a background theory and a set B of all of the 
agent's additional observations of the world. The new evidence B simply eliminates the possible worlds which do 
not satisfy B. To emulate this type of reasoning in P-log we first assume that the only formulas observable by the 
agent are literals. (The restriction is needed to stay in the syntactic boundaries of our language. As mentioned in 
Section |2] this restriction is not essential and can be eliminated by using a syntactically richer version of Answer 
Set Prolog.) The next theorem gives a relationship between classical conditional probabiUty and updates in P-log. 
Recall that if _B is a set of literals, adding the observation obs{B) to a program 11 has the effect of removing all 
possible worlds of 11 which fail to satisfy B. 

Proposition 4 

[Conditional Probability in P-log] 

For any coherent P-log program T, formula A, and a set of E-literals B such that Pt{B) ^ 0, 

PTuobsiB){A) = PriA A B)/Pt{B) 

In other words, 

PT{A\B) = PTuobsiB){A) 

□ 

Proof: 

Let us order all possible worlds of T in such a way that 
{wi • • • Wj } is the set of all possible worlds of T that contain both A and B, 
{wi • • • ui;} is the set of all possible worlds of T that contain B, and 
{wi • • • ui„} is the set of all possible worlds of T. 

Programs of Answer Set Prolog are monotonic with respect to constraints, i.e. for any program 11 and a set of 
constraints C, X is an answer set of 11 U C iff it is an answer set of P satisfying C. Hence the possible worlds of 
T U obs{B) will be all and only those of T that satisfy B. In what follows, we will write /i and fi for fix and jiT, 



Probabilistic reasoning with answer sets 



23 



respectively. Now, by the definition of probability in P-log, if Pt{B) ^ 0, then 




Now if we divide both the numerator and denominator by the normalizing factor for T, we have 



ELiA(^0 _ _ ELi/^(^0 _ PT{A^B) 



This completes the proof. 



□ 



Example 15 

[Dice example: upgrading the degree of belief] 

Let us consider program Ti from Example|8]and a new observation even{d2). To see the influence of this new evi- 
dence on the probabiUty of ^2 showing a 4 we can compute (ro/Z((i2) = 4) where T2 = Ti\j{obs{even{d2))}. 
Addition of the new observations eliminates those possible worlds of Ti in which the score of d2 is not even. T2 
has 18 possible worlds. Three of them, containing roll{di) = 6, have the unnormalized probabilities 1/24 each. 
The unnormalized probability of every other possible world is 1/40. Their measures are respectively 1/12 and 
1/20, and hence PT2{i"oll{d2) = 4) = 1/3. By Proposition 2] the same result can be obtained by computing 
standard conditional probability F^i {roll{d2) — 4| even{d2)). □ 

Now we consider a number of other types of P-log updates which will take us beyond the updating abilities of the 
classical Bayesian approach. Let us start with an update of T by 



where /'s are literals. 

To understand a substantial difference between updating n by ohs{l) and by a fact I one should consider the ASP 
counterpart r(n) of H. The first update correspond to expanding T(n) by the denial ^ not I while the second 
expands t{I\) by the fact I. As discussed in Appendix III constraints and facts play different roles in the process 
of forming agent's beliefs about the world and hence one can expect that 11 U {ohs{l)} and 11 U {/} may have 
different possible worlds. 

The following examples show that it is indeed the case. 
Example 16 

[Conditioning on obs{l) versus conditioning on I] 
Consider a P-log program T 

P ■■ {2/1, 2/2}- 
q : boolean. 
random{p). 

-nq ^ not q,p = yi. 
-nq^ p = 2/2. 

It is easy to see that no possible world of T contains q and hence Priq) = 0. Now consider the set B = {q, p = 
yi} of literals. The program T U ohs{B) has no possible worlds, and hence the PTuobs{B){<l) is undefined. In 
contrast, TUB has one possible world, {q,p — yi, ■ ■ ■} and hence PruBiq) = 1- The update B allowed the 
reasoner to change its degree of belief in q from to 1, a thing impossible in the classical Bayesian framework. □ 



B — {h, . . . ,1,1} ■ 



(14) 
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Note that since for T and B from Example [T6l we have that Pt{B) = 0, the classical conditional probability 
of A given B is undefined. Hence from the standpoint of classical probability Example [16] may not look very 
surprising. Perhaps somewhat more surprisingly, PTuobs{B}{^) ™d Ptub{A) may be different even when the 
classical conditional probability of A given B is defined. 

Example 1 7 

[Conditioning on ohs{l) versus conditioning on Z] 
Consider a P-log program T 

q : boolean. 
random(p). 

P = yi- 

-^q ^ not q. 

It is not difficult to check that program T has two possible worlds, Wi, containing {p = yi, q} and W2, 
containing {p ~ 1/2, ^q}. Now consider an update T U obs{q). It has one possible world, Wi. Program 
T U {q} is however different. It has two possible worlds, Wi and W3 where W3 contains {p = y2, q}; 

MTu{g}( W^i) = Mtu{«}( W^s) = 1/2. This implies that PTuobs(q){p = J/i) = 1 while Ptu{«}(p = Vi) = 1/2- 
□ 

Note that in the above cases the new evidence contained a literal formed by an attribute, q, not explicitly defined as 
random. Adding a fact a{t) = t/ to a program for which a{t) is random in some possible world will usually cause 
the resulting program to be incoherent. 

4.2 Updates Involving Actions 

Now we discuss updating the agent's knowledge by the effects of deliberate intervening actions, i.e. by a collection 
of statements of the form 

doiB) = {do{a(t) = y) : ^ y) e B} (15) 

As before the update is simply added to the background theory. The results however are substantially different 
from the previous updates. The next example illustrates the difference. 

Example 18 
[Rat Example] 

Consider the following program, T, representing knowledge about whether a certain rat will eat arsenic today, and 
whether it will die today. 

arsenic, death : boolean- 
[ 1 ] random{arsenic)- 
[ 2 ] random{death)- 
pr{arsenic) = • 4- 
pr{death \ c arsenic) = • 8- 
pr{death \ c -'arsenic) — • 01- 

The above program tells us that the rat is more likely to die today if it eats arsenic. Not only that, the intuitive 
semantics of the pr atoms expresses that the rat's consumption of arsenic carries information about the cause of his 
death (as opposed to, say, the rat's death being informative about the causes of his eating arsenic). 
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An intuitive consequence of this reading is that seeing the rat die raises our suspicion that it has eaten arsenic, 
while killing the rat (say, with a pistol) does not affect our degree of belief that arsenic has been consumed. The 
following computations show that the principle is reflected in the probabilities computed under our semantics. 

The possible worlds of the above program, with their unnormaUzed probabilities, are as follows (we show only 
arsenic and death literals): 



Wl 



{arsenic, death}- A(™i) = 0- 4*0-8 = 0-32 
{arsenic, -^death}- fi(w2) = 0- 4*0-2 = 0-08 
{-larsenic, death}- A(^3) = 0- 6*0-01 = 0-06 
{-larsenic, -ideath}- p.{w4) = 0- 6*0-99 = 0-54 



Since the unnormalized probabilities add up to 1, the respective measures are the same as the unnormalized prob- 
abilities. Hence, 

Priarsemc) = ^{wl) + /i(w3) = 0-32 + 0-08 = 0- 4 

To compute probability of arsenic after the observation of death we consider the program Ti = T\j{obs{ death) } 

The resulting program has two possible worlds, wi and W3, with unnormalized probabilities as above. Normaliza- 
tion yields 

Pr.iarsemc) = • 32/(0 • 32 + • 06) = • 8421 

Notice that the observation of death raised our degree of belief that the rat had eaten arsenic. 

To compute the effect of do{death) on the agent's belief in arsenic we augment the original program with the 
literal do(death). The resulting program, T2, has two answer sets, wi and W3. However, the action defeats the 
randomness of death so that wi has unnormalized probability • 4 and W3 has unnormalized probability • 6. These 
sum to one so the measures are also ■ 4 and ■ 6 respectively, and we get 

PT2{0''rsenic) =0-4 

Note this is identical to the initial probability Pt {arsenic) computed above. In contrast to the case when the effect 
(that is, death) was passively observed, deliberately bringing about the effect did not change our degree of belief 
about the propositions relevant to the cause. 

Propositions relevant to a cause, on the other hand, give equal evidence for the attendant effects whether they are 
forced to happen or passively observed. For example, if we feed the rat arsenic, this increases its chance of death, 
just as if we had observed the rat eating the arsenic on its own. The conditional probabilities computed under our 
semantics bear this out. Similarly to the above, we can compute 

Pt {death) =0-38 

PTU{do{arsemc)}{de'^'th) ==0-8 

PT\j{obs(arsemc)}{death) =0-8 □ 



Note that even though the idea of action based updates comes from Pearl, our treatment of actions is technically 
different from his. In Pearl's approach, the semantics of the do operator are given in terms of operations on graphs 
(specifically, removing from the graph all directed links leading into the acted-upon variable). In our approach the 
semantics of do are given by non-monotonic axioms (|9]l and (fTOl i which are introduced by our semantics as part 
of the translation of P-log programs into ASP. These axioms are triggered by the addition of do{a(t) — y) to the 
program. 
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4.3 More Complex Updates 
Now we illustrate updating the agent's knowledge by more complex regular rules and by probabilistic information. 

Example 19 

[Adding defined attributes] 

In this example we show how updates can be used to expand the vocabulary of the original program. Consider for 
instance a program Ti from the die example|5] An update, consisting of the rules 

max.score : boolean- 

max.score <— score{di) — 6, score{d2) = 6. 

introduces a new boolean attribute, max_score, which holds iff both dice roll the max score. The probability of 
max.score is equal to the product of probabilities of score{di) = 6 and score{d2) = 6. □ 



Example 20 

[Adding new rules] 
Consider a P-log program T 

rf = {l,2}. 

p : d boolean. 

random{p{X)). 

The program has four possible worlds: Wi = {p(l),p(2)}, W2 = {-.p(l), p(2)}, W3 = {p(l), -.p(2)}, W4 = 
{^p(l), -'p(2)}. It is easy to see that Pripil)) = 1/2. What would be the probability of p(l) if p{l) and p{2) 
were mutually exclusive? To answer this question we can compute Ptub{p{^)) where 

B = {-p(l) ^p(2); -p(2) ^p(l)}. 

Since TUB has three possible worlds, W2, W3, W4, we have that Ptub{p{^)) — 1/3. The new evidence forced 
the reasoner to change the probability from 1/2 to 1/3. □ 

The next example shows how a new update can force the reasoner to view a previously non-random attribute as 
random. 



Example 21 

[Adding Randomness] 

Consider T consisting of the rules: 

ai, a2, 03 '■ boolean. 

ai ^ 02- 

02 < — not -102- 

The program has one possible world, W — {ai, a2}. 
Now let us update T by i? of the form: 

-'02- 

random{ai) <— -102- 

The new program, TUB, has two possible worlds 
Wi — {oi, -102} and 
W2 = {-'ai,-.a2} 
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Example 22 

[Adding Causal Probability] 

Consider programs Ti consisting of the rules: 

a : boolean. 
random{a). 

and T2 consisting of the rules: 

a : boolean. 
random{a). 
pr{a) = 1/2. 

The programs have the same possible worlds, Wi = {p} and W2 — {^p}, and the same probability functions 
assigning 1/2 to Wi and M^2- The programs however behave differently under simple update U — {pr{a) = 1/3}. 
The updated Ti simply assigns probability 1/3 and 2/3 to Wi and W2 respectively. In contrast the attempt to apply 
the same update to T2 fails, since the resulting program violates Condition l2l from |3 .21 This behavior may shed 



some light on the principle of indifference. According to ( Jr and Teng 200 1| "One of the oddities of the principle of 



indifference is that it yields the same sharp probabihties for a pair of alternatives about which we know nothing at 
all as it does for the alternative outcomes of a toss of a thoroughly balanced and tested coin". The former situation 
is reflected in Ti where principle of indifference is used to assign default probabilities. The latter case is captured 
by T2, where pr{a) — 1/2 is the result of some investigation. Correspondingly the update U of Ti is viewed as 
simple additional knowledge - the result of study and testing. The same update to T2 contradicts the established 
knowledge and requires revision of the program. D 



It is important to notice that an update in P-log cannot contradict original background information. An attempt to 
add to a program containing a or to add pr{a) — 1/2 to a program containing pr{a) = 1/3 would result in 
an incoherent program. It is possible to expand P-log to allow such new information (referred to as "revision" in 
the literature) but the exact revision strategy seems to depend on particular situations. If the later information is 
more trustworthy then one strategy is justified. If old and new information are "equally valid", or the old one is 
preferable then other strategies are needed. The classification of such revisions and development of the theory of 
their effects is however beyond the scope of this paper. 



5 Representing knowledge in P-log 

This section describes several examples of the use of P-log for formalization of logical and probabilistic reasoning. 
We do not claim that the problems are impossible to solve without P-log; indeed, with some intelligence and effort, 
each of the examples could be treated using a number of different formal languages, or using no formal language 
at all. The distinction claimed for the P-log solutions is that they arise directly from transcribing our knowledge 
of the problem, in a form which bears a straightforward resemblance to a natural language description of the same 
knowledge. The "straightforwardness" includes the fact that as additional knowledge is gained about a problem, it 
can be represented by adding to the program, rather than by modifying existing code. All of the examples of this 
section have been run on our P-log interpreter. 
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5.1 Monty Hall problem 

We start by solving the Monty Hall Problem, which gets its name from the TV game show hosted by Monty HaU 
(we follow the description from http://www.io. com/'^kmellis/monty.html). A player is given the opportunity to 
select one of three closed doors, behind one of which there is a prize. Behind the other two doors are empty rooms. 
Once the player has made a selection, Monty is obligated to open one of the remaining closed doors which does 
not contain the prize, showing that the room behind it is empty. He then asks the player if he would like to switch 
his selection to the other unopened door, or stay with his original choice. Here is the problem: does it matter if he 
switches? 

The answer is YES. In fact switching doubles the player's chance to win. This problem is quite interesting, be- 
cause the answer is felt by most people — often including mathematicians — to be counter-intuitive. Most people 
almost immediately come up with a (wrong) negative answer and are not easily persuaded that they made a mis- 
take. We believe that part of the reason for the difficulty is some disconnect between modeling probabilistic and 
non-probabilistic knowledge about the problem. In P-log this disconnect disappears which leads to a natural cor- 
rect solution. In other words, the standard probability formalisms lack the ability to explicitly represent certain 
non-probabilistic knowledge that is needed in solving this problem. In the absence of this knowledge, wrong con- 
clusions are made. This example is meant to show how P-log can be used to avoid this problem by allowing us to 
specify relevant knowledge explicitly. Technically this is done by using a random attribute open with the dynamic 
range defined by regular logic programming rules. 

The domain contains the set of three doors and three 0-arity attributes, selected, open and prize. This will be 
represented by the following P-log declarations (the numbers are not part of the declaration; we number statements 
so that we can refer back to them): 

1- rfoors = {1,2,3}- 

2 • open, selected, prize : doors- 

The regular part contains rules that state that Monty can open any door to a room which is not selected and which 
does not contain the prize. 

3 • -'can_open{D) ^ selected = D- 

4 • -'can_open{D) ^ prize ~ D- 

5 • can_open{D) ^ not -ican_open{D)- 

The first two rules are self-explanatory. The last rule, which uses both classical and default negations, is a typical 
ASP representation of the closed world assumption ( IReiter 19781 ) — Monty can open any door except those which 
are explicitly prohibited. 

Assuming the player selects a door at random, the probabilistic information about the three attributes of doors can 
be now expressed as follows: 

6 • random{prize)- 

7 • random{selected)- 

8 • random[open : {X : can_open(X)}) ■ 

Notice that rule (8) guarantees that Monty selects only those doors which can be opened according to rules (3)-(5). 
The knowledge expressed by these rules (which can be extracted from the specification of the problem) is often 
not explicitly represented in probabilistic formalisms leading to reasoners (who usually do not realize this) to insist 
that their wrong answer is actually correct. 

The P-Log program Hmontyo consisting of the logical rules (l)-(8) represents our knowledge of the problem do- 
main. It has the following 12 possible worlds: 
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{selected = 


1 , 'prize = 


1, open — 


2,- 


•}. 


W2 = 


{selected = 


1 , prize ~ 


1, open = 


3,- 


•}. 


W3 = 


{selected = 


1 , prize = 


2, open = 


3,- 


•}. 


Wi = 


{selected = 


1 , prize = 


3, open = 


2,- 


•}. 


W5 = 


{selected — 


2, prize — 


1, open = 


3,- 


•}. 


We = 


{selected = 


2, prize — 


2, open — 


1,- 


•}. 


Wj = 


{selected = 


2, prize = 


2, open = 


3,- 


•}■ 


Ws — 


< ^pl prtpn — 


i^nri yp 

Lj ^ L/ 1 LajIj 


nnpn — 




A 
/ ■ 


Wg = 


{selected = 


3, prize = 


1, open = 


2,- 


•}■ 


Wio- 


- {selected - 


- 3, prize - 


- 2, open - 


-- 1, 


••}■ 


Wii = 


- {selected - 


- 3, prize - 


- 3, open - 


-- 1, 


••}■ 


W12 = 


- {selected - 


- 3, prize - 


- 3, open - 


= 2, 


••}■ 



According to our definitions they will be assigned various probability measures. For instance, selected has three 
possible values in each Wi, none of which has assigned probabilities. Hence, according to the definition of the 
probability of an atom in a possible world from Section |3^ 

P{W,, selected = j) = 1/3 

for each i and j. Similarly for prize 

P{Wi, prize = j) = 1/3 

Consider Wi. Since can_open{l) Wi the atom open = 1 is not possible in Wi and the corresponding prob- 
ability P{Wi, open = 1) is undefined. The only possible values of open in Wi are 2 and 3. Since they have no 
assigned probabilities 

P{Wi,open = 2) = PD{Wi, open = 2) = 1/2 
P{Wi,open = 3) = ^(14^1, open = 3) = 1/2 

Now consider W4. W4 contains can_open{2) and no other can-open atoms. Hence the only possible value of 
open in W4 is 2, and therefore 

P{ Wi, open = 2) = PD{Wi, open = 2) = 1 

The computations of other values of P{Wi, open — j) are similar 

Now to proceed with the story, first let us eliminate an orthogonal problem of modeling time by assuming that we 
observed that the player has already selected door 1, and Monty opened door 2 revealing that it did not contain the 
prize. This is expressed as: 

obs{selected = 1) • obs{open — 2) • obs{prize ^ 2)- 



Let us refer to the above P-log program as Hmontyi - Because of the observations Hmontyi has two possible worlds 
Wi, and W4: the first containing prize = 1 and the second containing prize — 3. It follows that 

fi{Wi) = P{Wi, selected = 1) x P{Wi, prize = 1) x P{Wi, open = 2) = 1/18 

/t(W4) = P{Wi, selected = 1) x P{Wi, prize = 3) x P(FFi, open = 2) = 1/9 



/i(W^i) 



1/18 



1/3 



1/18+1/9 



l^iWi) 



1/9 



2/3 



'tyl 



1/18+1/9 

(prize = 1) 



1/3 



i.onty 1 



{prize — 3) 



2/3 
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Changing doors doubles the player's chance to win. 

Now consider a situation when the player assumes (either consciously or without consciously realizing it) that 
Monty could have opened any one of the unopened doors (including one which contains the prize). Then the 
corresponding program will have a new definition of can.open. The rules (3-5) will be replaced by 

-ican_open{D) ^ selected = D- 
can_open{D) <— not -^can_open{D)- 

The resulting program Il„ionty2 will also have two possible worlds containing prize = 1 and prize — 3 respec- 
tively, each with unnormalized probability of 1/18, and therefore Pn,„<,„ta2(?"^*-2^s ^ 1) = 1/2 and Pn,„„„ta2(P'^*-2^6 = 
3) = 1/2. In that case changing the door will not increase the probability of getting the prize. 

Program Hmontyi has no explicit probabilistic information and so the possible results of each random selection are 
assumed to be equally likely. If we learn, for example, that given a choice between opening doors 2 and 3, Monty 
opens door 2 four times out of five, we can incorporate this information by the following statement: 

9 • pr{open — 2 \ c can_open(2), can_open{3)) — 4/5 

A computation similar to the one above shows that changing doors still increases the players chances to win. Of 
course none of the above computations need be carried out by hand. The interpreter will do them automatically. 

In fact changing doors is advisable as long as each of the available doors can be opened with some positive 
probability. Note that our interpreter cannot prove this general result even though it will give proper advice for any 
fixed values of the probabilities. 

The problem can of course be generalized to an arbitrary number n of doors simply by replacing rule (1) with 

doors = {1, . . . , n}. 



5.2 Simpson 's paradox 



Let us consider the following story from jPearl 2000t : A patient is thinking about trying an experimental drug and 
decides to consult a doctor. The doctor has tables of the recovery rates that have been observed among males and 
females, taking and not taking the drug. 



Males: 




fraction_of_population 


recovery _rate 


drug 


3/8 


60% 


^ drug 


1/8 


70% 


Females: 




fraction_of_population 


recovery _rate 


drug 


1/8 


20% 


^ drug 


3/8 


30% 



What should the doctor's advice be? Assuming that the patient is a male, the doctor may attempt to reduce the 
problem to checking the following inequality involving classical conditional probabilities: 



P(recover\male,-'drug) < P[recover\male, drug) 



(16) 
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The corresponding probabilities, if directly calculated from the tabled are • 7 and • 6. The inequality fails, and 
hence the advice is not to take the drug. A similar argument shows that a female patient should not take the drug. 

But what should the doctor do if he has forgotten to ask the patient's sex? Following the same reasoning, the doctor 
might check whether the following inequality is satisfied; 

P{recover\^drug) < P{recover\drug) (17) 

This will lead to an unexpected result. P {recovery \ drug) =0-5 while P {recovery \^ drug) =0-4. The drug 
seems to be beneficial to patients of unknown sex — though similar reasoning has shown that the drug is harmful 
to the patients of known sex, whether they are male or female ! 

This phenomenon is known as Simpson's Paradox: conditioning on A may increase the probability of B among 
the general population, while decreasing the probability of B in every subpopulation (or vice-versa). In the current 
context, the important and perhaps surprising lesson is that classical conditional probabilities do not faithfully 
formalize what we really want to know: what will happen if we do X? In jPearl 2000l l Pearl suggests a solution 
to this problem in which the effect of deliberate action A on condition C is represented by P{C\do{A)) — a 
quantity defined in terms of graphs describing causal relations between variables. Correct reasoning therefore 
should be based on evaluating the inequality 

P{recover\do{-'drug)) < P{recover\do{drug)) (18) 

instead of ( fTTl ): this is also what should have been done for ( fT6l) . 

To calculate dTSI ) using Pearl's approach one needs a causal model and it should be noted that multiple causal 
models may be consistent with the same statistical data. P-log allows us to express causality and we can determine 
the probability Pn of a formula C given that action A is performed by computing Puu{do{A)}{ C)- 

Using the tables and added assumption about the direction of causalitjH between the variables, we have the values 
of the following causal probabilities: 

pr{male) =0-5. 
pr{recover \c male, drug) =0-6. 
pr{recover \c male, -idrug) =0-7. 
pr{recover \c -imale, drug) =0-2. 
pr{recover \c -'male, -idrug) =0-3. 
pr{drug \ c male) — • 75. 
pr{drug \ c -^male) — -25. 

These statements, together with declarations: 

male, recover, drug : boolean 
[1] random{male) . 
[2] random{recover). 
[3] random{drug) . 

constitute a P-log program, 11, that formalizes the story. 

The program describes eight possible worlds containing various values of the attributes. Each of these worlds and 
their unnormaUzed and normalized probabilities is calculated below. 

^ If the tables are treated as giving probabilistic information, then we get the following: P{male) = P(^male) = 0-5. P{drug) = 
Pij^drug) =0-5. P{recover \ male, drug) =0-6. P{recover \ male,^drug) = 0-7. P{recover \ -^male, drug) = 0-2. 
P{recover \ —•male, -^drug) =0-3. P{drug \ male) = • 75. P{drug \ —^male) = • 25. 

^ A different assumption about the direction of causality may lead to a different conclusion. 
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Wi = {male, recover, drug}. /i( W^i) = • 5 x • 6 x • 75 = • 225. Ai( M^i) = • 225. 

W2 = {male, recover, ^drug}. /i( ^^2) = • 5 x • 7 x • 75 = • 2625. fi{ W2) = ■ 2625. 

W3 = {male, ^recover, drug}. A( = • 5 x • 4 x • 75 = • 15. Ai( W3) = • 15. 

W4 = {male, ^recover, ^drug}. /*( = • 5 x • 3 x • 75 = • 1125. fi{ W4) ^ ■ 1125. 

W5 = {^male, recover, drug}. /i( W5) = • 5 x • 2 x • 25 = • 025. fi{ W5) ^ ■ 025. 

We = {^male, recover, ^drug}. /t( VKg) = • 5 x • 3 x • 35 = • 0375. /x( We) ^ ■ 0375. 

W7 = {^male, ^recover, drug}. /*( = • 5 x ■ 8 x • 25 = ■ 1. ^( = ■ 1. 

Ws = {^male, ^recover, ^drug}. /*( VKg) = • 5 x ■ 7 x ■ 25 = • 0875. ^{ Ws) = ■ 0875. 

Now let us compute Puiirecover) and Pu^^irecover) respectively, where IIi = 11 U {do{drug)} and 1X2 = 
UU {doi^drug)}. 

The four possible worlds of Hi and their unnormalized and normalized probabilities are as follows: 

Wi = {male, recover, drug}. /i( H^i') = ■ 5 x ■ 6 x 1 = • 3. ^( W^i') = • 3. 
W^ ^ {male, ^recover, drug}. /t( W^g') = • 5 x ■ 4 x 1 = ■ 2. Wj') = • 2. 
W^ = {^male, recover, drug}. /*( W^g') = • 5 x • 2 x 1 = ■ 1. W^g') = • 1. 
WI = {^male, -^recover, drug}. /t( VF^) = • 5 x • 8 x • 1 = • 4. ^( FK^) = • 4. 

From the above we obtain Pni {recover) = -4. 

The four possible worlds of 112 and their unnormalized and normalized probabilities are as follows: 

W^ = {male, recover, ^drug}. /i( W^a') = • 5 x • 7 x 1 = • 35. m( ^^2) = • 35. 
Wi = {male, -^recover, ^drug}. /i( 1^4') = • 5 x • 3 x 1 = • 15. /i( IV4') = • 15. 
W^ = {^male, recover, ^drug}. /i( IVg') = • 5 x • 3 x 1 = • 15. /i( IVg') = ' 15- 
W^ = {^male, ^recover, ^drug}. /i( M/g') = • 5 x • 7 x 1 • 35. Wg') = • 35. 

From the above we obtain {recover) — -5. Hence, if one assumes the direction of causality that we assumed, 
it is better not to take the drug than to take the drug. 

Similar calculations also show the following: 

Pnu{obs{m.aie),do(drug)}{recover) =0-6 
Pnu{obs{m.aie),do{^drug)}{recover) = 0-7 

Pnii{obsi^maie),do(drug)}{recover) = 0-2 

Pnil{obs{^male),do(^drug)}{'reCOVer) =0-3 

I.e., if we know the person is male then it is better not to take the drug than to take the drug, the same if we know 
the person is female, and both agree with the case when we do not know if the person is male or female. 

The example shows that queries of the form "What will happen if we do X?" can be easily stated and answered 
in P-log. The necessary P-log reasoning is nonmonotonic and is based on rules (|9]l and (fTOl i from the definition of 

r(n). 



5.3 A Moving Robot 

Now we consider a formalization of a problem whose original version, not containing probabilistic reasoning, first 
appeared in ( Iwan and Lakemeyer 2002| . 



There are rooms, say tq , n , r2 reachable from the current position of a robot. The rooms can be open or closed. The 
robot cannot open the doors. It is known that the robot navigation is usually successful. However, a malfunction 
can cause the robot to go off course and enter any one of the open rooms. 
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We want to be able to use our formalization for correctly answering simple questions about the robot's behavior 
including the following scenario: the robot moved toward open room n but found itself in some other room. What 
room can this be? 

As usual we start with formaUzing this knowledge. We need the initial and final moments of time, the rooms, and 
the actions. 

time = {0, 1} rooms = {ro, ri, r2}- 
We wiU need actions: 
goJn : rooms — > boolean- 
break : boolean. 

ab : boolean. 

The first action consists of the robot attempting to enter the room R at time step 0. The second is an exogenous 
breaking action which may occur at moment and alter the outcome of this attempt. In what follows, (possibly 
indexed) variables R will be used for rooms. 

A state of the domain will be modeled by a time-dependent attribute, in, and a time independent attribute open. 
(Time dependent attributes and relations are often referred to as Buents). 

open : rooms — > boolean- 

in : time — > rooms - 

The description of dynamic behavior of the system will be given by the rules below: 

First two rules state that the robot navigation is usually successful, and a malfunctioning robot constitutes an 
exception to this default. 

1. m(l) = <— go-in{R), not ab- 

2. ab ■*— break- 

The random selection rule (3) below plays a role of a (non-deterministic) causal law. It says that a malfunctioning 
robot can end up in any one of the open rooms. 

3. [r] random(m(l) : {R : open{R)}) <— goJn{R), break- 

We also need inertia axioms for the fluent in. 

4a. in{l) = R m(0) = R, not -im(l) = R- 
4b. m(l) R^ m(0) 7^ R, not in{l) = R- 

Finally, we assume that only closed doors will be specified in the initial situation. Otherwise doors are assumed to 
be open. 

5. open{R) <— not ->open{R)- 

The resulting program, Ho, completes the first stage of our formalization. The program will be used in conjunction 
with a collection X of atoms of the form m(0) = R, ^open{R), go-in{R), break which satisfies the following 
conditions: X contains at most one atom of the form m(0) = R (robot cannot be in two rooms at the same time); 
X has at most one atom of the form go_in{R) (robot cannot move to more than one room); X does not contain a 
pair of atoms of the form -^open(R), go_in{R) (robot does not attempt to enter a closed room); and X does not 
contain a pair of atoms of the form ^open{R), m(0) = R (robot cannot start in a closed room). A set X satisfying 
these properties will be normally referred to as a valid input of Hq. 
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Given an input Xi ~ {go_in{rQ)} the program Ho U Xi will correctly conclude m(l) = ro. The input 
^2 — {90-in{ro), break} will result in three possible worlds containing m(l) = vq, m(l) = ri and m(l) = r2 
respectively. If, in addition, we are given ^open{r2) the third possible world will disappear, etc. 

Now let us expand Hq by some useful probabilistic information. We can for instance consider Hi obtained from 
Hq by adding: 

8. prr{in{\) ~ R\c go-in{R), break) ~ 1/2- 

(Note that for any valid input X, Condition 3 of Section |3^ is satisfied for Hi U X , since rooms are assumed to 
be open by default and no valid input may contain -^open{R) and go_in{R) for any R.) Program Ti = Hi U Xi 
has the unique possible world which contains m(l) = tq. Hence, Pt^ — ro) = 1. 

Now consider T'2 = Hi U X2. It has three possible worlds: Wq containing m(l) tq, and Wi, W2 containing 
m ( 1 ) — ri and m ( 1 ) ~ r2 respectively. ( M^o ) is assigned a probability of 1 /2, while (Wi) = Pt2 {W2) = 
l/4by default. Therefore (m(l) = ro) = 1/2. Here the addition of &reafc to the knowledge base changed the 
degree of reasoner's belief in m(l) = ro from 1 to 1/2. This is not possible in classical Bayesian updating, for 
two reasons. First, the prior probability of break is and hence it cannot be conditioned upon. Second, the prior 
probability of m(l) = ro is 1 and hence cannot be diminished by classical conditioning. To account for this change 
in the classical framework requires the creation of a new probabilistic model. However, each model is a function 
of the underlying background knowledge; and so P-log allows us to represent the change in the form of an update. 



5.4 Bayesian squirrel 

In this section we consider an example from ( [Hilborn and Mangel 1997) used to illustrate the notion of Bayesian 
learning. One common type of learning problem consists of selecting from a set of models for a random phe- 
nomenon by observing repeated occurrences of the phenomenon. The Bayesian approach to this problem is to 
begin with a "prior density" on the set of candidate models and update it in light of our observations. 

As an example, Hilborn and Mangel describe the Bayesian squirrel. The squirrel has hidden its acorns in one of 
two patches, say Patch 1 and Patch 2, but can't remember which. The squirrel is 80% certain the food is hidden in 
Patch 1 . Also, it knows there is a 20% chance of finding food per day when it looking in the right patch (and, of 
course, a 0% probability if it's looking in the wrong patch). 

To represent this knowledge in P-log's program 11 we introduce sorts 

patch = {pi, p2}. 
day = {1 . . . n}. 

(where n is some constant, say, 5) 
and attributes 

hidden^in : patch. 

found : patch * day —^ boolean. 

look : day — > patch. 

Attribute hiddenAn is always random. Hence we include 
[ri] random {hidden_in). 

found is random only if the squirrel is looking for food in the right patch, i.e. we have 
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[r2] random {found{P, D)) <— hidden-in = P, look{D) = P. 

The regular part of the program consists of the closed world assumption for found: 

-^found{P, D) <— not found{P, D). 

ProbabiUstic information of the story is given by statements: 

prr-^ {hiddeu-in = pi) =0-8. 

prr^{found{P,D)) = 0-2. 

This knowledge, in conjunction with description of the squirrel's activity, can be used to compute probabilities of 
possible outcomes of the next search for food. 

Consider for instance program IIi = 11 U { do ( look ( 1 ) = pi)}. The program has three possible worlds 

Wi = {look{l) = pi, hiddenJn = p\Jound{pi, 1), . . .}, 
W2 = {lookil) = pi, hidden.in = pi, -ifound{pi, 1), . . .}, 
= {look{l) = pi, hiddenJn = p2, -^found{pi, 1), . . .}, 
with probabihty measures n{Wi)=0- 16, iJ.{ W2) = ■ 64, H^g) = • 2. 
As expected 

Pui {hidden_in = pi) = • 8, and 
Pni(Mnrf(pi,l)) = 0-16. 

Suppose now that the squirrel failed to find its food during the first day, and decided to continue her search in the 
first patch next morning. 

The failure to find food in the first day should decrease the squirrel's degree of belief that the food is hidden in 
patch one, and consequently decreases her degree of belief that she will find food by looking in the first patch 
again. This is reflected in the following computation: 

Let 112 = Hi U {obs{^found{pi, 1)), do{look{2) = pi)}. 

The possible worlds of 112 are: 

Wi = W [J {hiddenJn = pi, look{2) = pi,found{pi,2) . . .}, 
W2 = W L) {hiddenJn = pi, look{2) = p\,^found{p\,2) . . .}, 
W3 = W l^ {hiddenJn = p2, look{2) = p\,^found{p\,2) . . .}. 
where W = {look{l) = p\,^found{p\, 1)}- 
Their probabihty measures are 

pi{W^) = •128/-84= •152,/i(W'|) = -bU / ■ M = -Ql, ji^W^) = -2/ • 84 = -238. 
Consequently, 

Pn^{hiddenJn — pi) — Q ■ 762, and PYi^{found{pi,2)) = • 152, and so on. 

After a number of unsuccessful attempts to find food in the first patch the squirrel can come to the conclusion that 
food is probably hidden in the second patch and change her search strategy accordingly. 

Notice that each new experiment changes the squirrel's probabilistic model in a non-monotonic way. That is, the set 
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of possible worlds resulting from each successive experiment is not merely a subset of the possible worlds of the 
previous model. The program however is changed only by the addition of new actions and observations. Distinctive 
features of P-log such as the ability to represent observations and actions, as well as conditional randomness, play 
an important role in allowing the squirrel to learn new probabihstic models from experience. 

For comparison, let's look at a classical Bayesian solution. If the squirrel has looked in patch 1 on day 1 and not 
found food, the probability that the food is hidden in patch 1 can be computed as follows. First, by Bayes Theorem, 

, P{^find{l)\ hidden Jn = pi) * P {hidden Jn = pi) 

P{hidden = l\^found{pi, 1)) = — 

P{-^found{pi,l)) 

The denominator can then be rewritten as follows: 

P{-^find{l)) 

= P{-^found{pi, 1) U hidden-in = 1) + P{-^found{p\, 1) U hiddenJn = P2) 
= P(-i/o«nrf(pi, 1)1 hiddenJn = pi) * P{hidden-in = pi) + P{hiddenJn = P2) 
= 0-8*0-8 + 0-2 
= 0-84 

Substitution yields 

P{hiddenJn = pi\ ^found{pi, 1)) = (0 • 8 * • 8)/0 • 84 = • 762 

Discussion 

Note that the classical solution of this problem does not contain any formal mention of the action look{2) = pi. 
We must keep this informal background knowledge in mind when constructing and using the model, but it does 
not appear exphcitly. To consider and compare distinct action sequences, for example, would require the use of 
several intuitively related but formally uncormected models. In Causal Bayesian nets (or P-log), by contrast, the 
corresponding programs may be written in terms of one another using the do-operator. 

In this example we see that the use of the do-operator is not strictly necessary. Even if we were choosing between 
sequences of actions, the job could be done by Bayes theorem, combined with our ability to juggle several intu- 
itively related but formally distinct models. In fact, if we are very clever, Bayes Theorem itself is not necessary 
— for we could use our intuition of the problem to construct a new probability space, implicitly based on the 
knowledge we want to condition upon. 

However, though not necessary, Bayes theorem is very useful — because it allows us to formalize subtle reasoning 
within the model which would otherwise have to be performed in the informal process of creating the model(s). 
Causal Bayesian nets carry this a step further by allowing us to formahze interventions in addition to observa- 
tions, and P-log yet another step by allowing the formaUzation of logical knowledge about a problem or family of 
problems. At each step in this hierarchy, part of the informal process of creating a model is replaced by a formal 
computation. 

As in this case, probabilistic models are often most easily described in terms of the conditional probabilities of 
effects given their causes. From the standpoint of traditional probabihty theory, these conditional probabilities are 
viewed as constraints on the underlying probability space. In a learning problem like the one above, Bayes Theorem 
can then be used to relate the probabilities we are given to those we want to know: namely, the probabilities of 
evidence-given-models with the probabilities of models-given-evidence. This is typically done without describing 
or even thinking about the underlying probability space, because the given conditional probabilities, together with 
Bayes Theorem, tell us all we need to know. The use of Bayes Theorem in this manner is particular to problems 
with a certain look and feel, which are loosely classified as "Bayesian learning problems". 
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From the standpoint of P-log things are somewhat different. Here, all probabilities are defined with respect to 
bodies of knowledge, which include models and evidence in the single vehicle of a P-log program. Within this 
framework, Bayesian learning problems do not have such a distinctive quality. They are solved by writing down 
what we know and issuing a query, just like any other problem. Since P-log probabilities satisfy the axioms of 
probability, Bayes Theorem still applies and could be useful in calculating the P-log probabilities by hand. On the 
other hand, it is possible and even natural to approach these problems in P-log without mentioning Bayes Theorem. 
This would be awkward in ordinary mathematical probability, where the derivation of models from knowledge is 
considerably less systematic. 



5.5 Maneuvering the Space Shuttle 

So far we have presented a number of small examples to illustrate various features of P-log. In this section we 
outline our use of P-log for an industrial size apphcation: diagnosing faults in the reactive control system (RCS) of 
the Space Shuttle. 

To put this work in the proper perspective we need to briefly describe the history of the project. The RCS actuates 
the maneuvering of the shuttle. It consists of fuel and oxidizer tanks, valves, and other plumbing needed to provide 
propellant to the shuttle's maneuvering jets. It also includes electronic circuitry, both to control the valves in the 
fuel lines, and to prepare the jets to receive firing commands. To perform a maneuver. Shuttle controllers (i.e., 
astronauts and/or mission controllers) must find a sequence of commands which delivers propellant from tanks to 
a proper combination of jets. 

Answer Set Programming (without probabilities) was successfully used to design and implement the decision 
support system US A- Adviser dBalduccini et al. 20011 IBalduccini et al. 2002l l. which, given information about the 
desired maneuver and the current state of the system (including its known faults), finds a plan allowing the con- 
trollers to achieve this task. In addition the USA-Advisor is capable of diagnosing an unexpected behavior of 
the system. The success of the project hinged on Answer Set Prolog's ability to describe controllers' knowledge 
about the system, the corresponding operational procedures, and a fair amount of commonsense knowledge. It also 
depended on the existence of efficient ASP solvers. 

The US A- Advisor is build on a detailed but straightforward model of the RCS. For instance, the hydraulic part of 
the RCS can be viewed as a graph whose nodes are labeled by tanks containing propellant, jets, junctions of pipes, 
etc. Arcs of the graph are labeled by valves which can be opened or closed by a collection of switches. The graph 
is described by a collection of ASP atoms of the form connected {ni, u, 712) (valve v labels the arc from ni to 
712) and controls{s, v) (switch s controls valve v). The description of the system may also contain a collection of 
faults, e.g. a valve can be stuck, it can be leaking, or have a bad-circuitry. Similar models exists for electrical part 
of the RCS and for the connection between electrical and hydraulic parts. Overall, the system is rather complex, in 
that it includes 12 tanks, 44 jets, 66 valves, 33 switches, and around 160 computer commands (computer-generated 
signals). 

In addition to simple description of the RCS, USA-Advisor contains knowledge of the system's dynamic behavior. 
For instance the axiom 

-^faulty (C) «— not may_be_faulty{C)- 

says that in the absence of evidence to the contrary, components of the RCS are assumed to be working properly 
(Note that concise representation of this knowledge depends critically on the ability of ASP to represent defaults.) 
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the axioms 

h{state{S, open), T + 1) <- occurs {flip (S), T), 

h{state{S, closed), T), 
-I faulty (S)- 

h{state{S, closed), T + 1) ^ occurs {flip{S), T), 

h{state{S, open), T), 
^faulty {S)- 

express the direct effect of an action of flipping switch S. Here state is a function symbol with the first parameter 
ranging over switches and valves and the second ranging over their possible states; flip is a function symbol 
whose parameter is of type switch. Predicate symbol h (holds) has the first parameters ranging over fluents and 
the second one ranging over time-steps; two parameters of occur are of type action and time-step respectively. 
Note that despite the presence of function symbols our typing guarantees finiteness of the Herbrand universe of the 
program. The next axiom describes the connections between positions of switches and valves. 

h{state{V,P), T) ^ controls{S, V), 

h{state{S,P), T), 
-'fault{V, stuck)- 

A recursive rule 

h{pressurized{N2), T) <— connected{Ni, V,N2), 

h{pressurized{Ni), T), 
h{state{V, open), T), 
-'fault{V, leaking)- 

describes the relationship between the values of relation pressurized{N) for neighboring nodes. (Node N is 
pressurized if it is reached by a sufficient quantity of the propellant). These and other axioms, which are rooted 
in a substantial body of research on actions and change, describe a comparatively complex effect of a simple flip 
operation which propagates the pressure through the system. 

The plan to execute a desired maneuver can be extracted by a simple procedural program from answer sets of 
a program Us U PM, where lis consists of the description of the RCS and its dynamic behavior, and PM is a 
"planning module," containing a statement of the goal (i.e., maneuver), and rules needed for ASP-based planning. 
Similarly, the diagnosis can be extracted from answer sets of U DM, where the diagnostic module DM contains 
unexpected observations, together with axioms needed for the ASP diagnostics. 

After the development of the original USA- Advisor, we learned that, as could be expected, some faults of the RCS 
components are more likely than others, and, moreover, reasonable estimates of the probabilities of these faults can 
be obtained and utilized for finding the most probable diagnosis of unexpected observations. Usually this is done 
under the assumption that the number of multiple faults of the system is limited by some fixed bound. 

P-log allowed us to write software for finding such diagnoses. First we needed to expand by the corresponding 
declarations including the statement 

[r{C,F)] random{fault{C,F)) <— mayJ)eJaulty{C)- 

where m,ay-be-fault{C , F) is a boolean attribute which is true if component C may (or may not) have a fault of 
type F. The probabilistic information about faults is given by the pr-atoms, e.g. 

P'''r{v ,stack){fO''>^^t{V , stuck)\c may.heJaulty{V)) = • 0002- 

etc. To create a probabilistic model of our system, the ASP diagnostic module finds components relevant to the 
agent's unexpected observations, and adds them to DM as a collection of atoms of the form may_bejaulty{c). 
Each possible world of the resulting program (viz., P = U DM) uniquely corresponds to a possible explanation 
of the unexpected observation. The system finds possible worlds with maximum probability measure and returns 
diagnoses defined by these worlds, where an "explanation" consists of aU atoms of the form fault{c,f) in a 
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given possible world. This system works very efficiently if we assume that maximum number, n, of faults in the 
explanation does not exceed two (a practically realistic assumption for our task). If n equals 3 the computation 
is substantially slower There are two obvious ways to improve efficiency of the system: improve our prototype 
implementation of P-log or reduce the number of possibly faulty components returned by the original diagnostic 
program or both. We are currently working in both of these directions. It is of course important to realize that 
the largest part of all these computations is not probabilistic and is performed by the ASP solvers, which are 
themselves quite mature. However the conceptual blending of ASP with probabilities achieved by P-log allowed 
us to successfully express our probabilistic knowledge, and to define the corresponding probabilistic model, which 
was essential for the success of the project. 



6 Proving Coherency of P-log Programs 

In this section we state theorems which can be used to show the coherency of P-log programs. The proofs of the 
theorems are given in an Appendix I. We begin by introducing terminology which makes it easier to state the 
theorems. 



6.1 Causally ordered programs 
Let n be a (ground) P-log program with signature E. 

Definition 9 

[Dependency relations] 

Let li and I2 be literals of S. We say that 

1. li is immediately dependent on I2, written as h <i h, if there is a rule r of 11 such that li occurs in the head 
of r and I2 occurs in the r's body; 

2. li depends on I2, written as /i < h, if the pair (Zi, I2) belongs to the reflexive transitive closure of relation 

3. An attribute term ai(Ii) depends on an attribute term 02(12) if there are literals li and I2 formed by ai(Ii) 
and 02(12) respectively such that k depends on Z2- D 

Example 23 
[Dependency] 

Let us consider a version of the Monty Hall program consisting of rules (1) - (9) from Subsection l5.ll Let us denote 
it by U^ontys- From rules (3) and (4) of this program we conclude that ^can_open{d) is immediately dependent 
on prize = d and selected — d for every door d. By rule (5) we have that for every d S doors, can_open{d) 
is immediately dependent on -^can_open{d). By rule (8), open = di is immediately dependent on can_open{d2) 
for any rfi, (i2 £ doors. Finally, according to (9), open = 2 is immediately dependent on can_open{2) and 
can_open{3). Now it is easy to see that an attribute term open depends on itself and on attribute terms prize and 
selected, while each of the latter two terms depends only on itself. □ 

Definition 10 
[Leveling function] 
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A leveling function, \ |, of 11 maps attribute terms of S onto a set [0, n] of natural numbers. It is extended to other 
syntactic entities over S as follows: 

|a(<) = y\ = \o-{t) y\ = \not a{t) = y\ = \not a{t) y\ = \o.{t)\ 

We'll often refer to |e| as the rank of e. Finally, if S is a set of expressions then |_B| = maa;({|e| : e G B}). □ 

Definition 11 

[Strict probabilistic leveling and reasonable programs] 
A leveling function | | of 11 is called strict probabilistic if 

1 . no two random attribute terms of E have the same level under | | ; 

2. for every random selection rule [r] random{a(t) : {y : p{y)}) ^ B of 11 we have 

\a{t) = 2/1 < \{piy) ■■ y e range{a)} U B\; 

3. for every probability atom prr{a(t) — y \c B) of 11 we have |a(I)| < \B\; 

4. if ai(Ii) is a random attribute term, 02(^2) is a non-random attribute term, and 02(^2) depends on ai(Ii) 
then \a2(t2)\ > 

A P-log program 11 which has a strict probabihstic leveling function is called reasonable. □ 
Example 24 

[Strict probabilistic leveling for Monty Hall] 

Let us consider the program Hmontya from Example |23] and a leveling function 

prize I = 
\selected\ = 1 
\can_open{D)\ = 1 
I open I — 2 

We claim that this leveling is a strict probabilistic levelling. Conditions (l)-(3) of the definition can be checked 
directly. To check the last condition it is sufficient to notice that for every D the only random attribute terms on 
which non-random attribute term can_open{D) depends are selected and prize. □ 

Let n be a reasonable program with signature S and leveling | |, and let ai(<i ),..., be an ordering of its 

random attribute terms induced by | |. By we denote the set of literals of I] which do not depend on literals 
formed by ) where i < j. 11^ for 1 < i < n + 1 consists of all declarations of 11, along with the regular 
rules, random selection rules, actions, and observations of 11 such that every literal occurring in them belongs to 
Li. We'll often refer to Hi, ... , n„_|_i as a | | -induced structure of 11. 

Example 25 

[Induced structure for Monty Hall] 

To better understand this construction let us consider a leveling function | | from Example |24l It induces the 
following ordering of random attributes of the corresponding program. 

01 = prize. 

02 = selected. 

03 = open. 
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The corresponding languages are 

Li = 

L2 — {prize — d : d E doors} 

L3 — L2U {selected = d : d E doors} U {can_open{d) : d G doors} U {^can_open{d) : d G doors} 
L4 — L3U {open = d : d Cz doors} 

Finally, the induced structure of the program is as follows (numbers refer to the numbered statements of Subsection 

ED 

Hi = {1,2} 
n2 = {1,2,6} 

n3 = {i,...,7} 

n4 = {i,...,8} □ 

Before proceeding we introduce some terminology. 

Definition 12 
[Active attribute term] 

If there is y such that a(t) = y is possible in W with respect to H, we say that a(t) is active in W with respect to 

n. □ 



Definition 13 

[Causally ordered programs] 

Let n be a P-log program with a strict probabiUstic leveling | | and let be the i*'' random attribute of 11 with 
respect to | | . We say that 11 is causally ordered if 

1. Hi has exactly one possible world; 

2. if is a possible world of Hi and atom ai(Ii) = 7/0 is possible in with respect to Ili+i then the program 
W U 11,;+ 1 U obs{ai(ii) — yo) has exactly one possible world; and 

3. if is a possible world of IIj and ai(Ii) is not active in with respect to Ili+i then the program M^UlIi+i 
has exactly one possible world. □ 

Intuitively, a program is causally ordered if (1) all nondeterminism in the program results from random selections, 
and (2) whenever a random selection is active in a given possible world, the possible outcomes of that selection 
are not constrained in that possible world by logical rules or other random selections. The following is a simple 
example of a program which is not causally ordered, because it violates the second condition. By comparison with 
Example[T2] it also illustrates the difference between the statements a and pr{a) — 1. 

Example 26 

[A non-causally ordered programs] 
Consider the P-log program 11 consisting of: 

1 ■ a : boolean. 

2 • random a. 

3 • a- 

The only leveling function for this program is |a| = 0, hence Li = while L2 — {a,^a}; and Hi = {1} 
while 112 = {1, 2, 3}. Obviously, Hi has exactly one possible world, namely Wi = 0. Both literals, a and -la are 
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possible in Wi with respect to 112. However, 1/Fi U 112 U obs{^a) has no possible worlds, and hence the program 
does not satisfy Condition 2 of the definition of causally ordered. 

Now let us consider program 11' consisting of rules (1) and (2) of 11 and the rules 

b <— not -^b, a. 
-^b «— not b, a. 

The only strict probabilistic leveling function for this program maps a to and 6 to 1. The resulting languages are 
Li — % and L2 = {a, -la, &, -16}. Hence li'^ — {1} and IIj = 11'. As before, Wi is empty and a and -la are both 
possible in Wi with respect to II'j. It is easy to see that program 1/Fi U IIj U obs{a) has two possible worlds, one 
containing b and another containing Hence Condition 2 of the definition of causally ordered is again violated. 

Finally, consider program 11" consisting of rules: 

1 • a,b : boolean. 

2 • random{a). 

3 • random(b) <— a. 

4 • -16 ^ -la. 

5 • c 

6 • -ic. 

It is easy to check that c immediately depends on which in turn immediately depends on a and -^a. b im- 
mediately depends on a. It follows that any strict probabilistic leveling function for this program will lead to the 
ordering a, b of random attribute terms. Hence Li = {^c}, L2 — {^c, a, -la}, and L3 = L2 U {b, -^b, c}. This 
implies that H'/ = {1, 6}, H'j' = {1, 2, 6}, and n(,' = {1, . . . , 6}. Now consider a possible world W = {^c, -.a} 
of IIj . It is easy to see that the second random attribute, b, is not active in W with respect to Ilg, but W U Ilg has 
no possible world. This violates Condition 3 of causally ordered. 

Note that all the above programs are consistent. A program whose regular part consists of the rule p ^ not p 
is neither causally ordered nor consistent. Similarly, the program obtained from 11 above by adding the atom 
pr{a) = 1/2 is neither causally ordered nor consistent. □ 



Example 27 

[Monty Hall program is causally ordered] 

We now show that the Monty Hall program Tlmonty^ is causally ordered. We use the strict probabilistic leveling 
and induced structure from the Examples l24l and |25] Obviously, Hi has one possible world Wi = 0. The atoms 
possible in Wi with respect to 112 are prize — 1, prize = 2, prize — 3. So we must check Condition 2 from the 
definition of causally ordered for every atom prize = d from this set. It is not difficult to show that the translation 
t{Wi U 112 U obs{prize — d)) is equivalent to logic program consisting of the translation of declarations into 
Answer Set Prolog along with the following rules: 

prize{l) or prize{2) or prize(3). 
-^prize[Di) ^ prize{D2), Di ^ D2. 
<— 06s (prize (1)), not prize{d). 
obs{prize{d)). 

where Di and D2 range over the doors. Except for the possible occurrences of observations this program is equiv- 
alent to 

-^prize{Di) ^ prize {D2), Di ^ D2. 
prize(d). 
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which has a unique answer set of the form 

{prize(d), -'prize{di) , -'prize(d2)} (19) 

(where di and d2 are the other two doors besides d). Now let W2 be an arbitrary possible world of 112, and / be an 
atom possible in W2 with respect to H^. To verify Condition 2 of the definition of causally ordered for i = 2, we 
must show that I/F2 U 112 U obs{l) has exactly one answer set. It is easy to see that W2 must be of the form ( fT9] l. 
and I must be of the form selected — d' for some door d' . 

Similarly to above, the translation of W2 U Ha U obs{selected{d')) has the same answer sets (except for possible 
occurrences of observations) as the program consisting of W2 along with the following rules: 

selected{d'). 

-iselected{Di) ^ selected{D2), Di ^ D2- 
-ican_open{D) <— selected{D). 
—'can_open{D) <— prize{D). 
can_open not -^can_open{D). 

If negated literals are treated as new predicate symbols we can view this program as stratified. Hence the program 
obtained in this way has a unique answer set. This means that the above program has at most one answer set; but it 
is easy to see it is consistent and so it has exactly one. It now follows that Condition 2 is satisfied for i = 2. 

Checking Condition 2 for i = 3 is similar, and completes the proof. □ 

"Causal ordering" is one of two conditions which together guarantee the coherency of a P-log program. Causal 
ordering is a condition on the logical part of the program. The other condition — that the program must be "unitary" 
— is a condition on the pr-atoms. It says that, basically, assigned probabilities, if any, must be given in a way that 
permits the appropriate assigned and default probabilities to sum to 1 . In order to define this notion precisely, and 
state the main theorem of this section, we will need some terminology. 

Let n be a ground P-log program containing the random selection rule 

[r] random{a{t) : {Y : p{Y)}) ^ K- 

We will refer to a ground pr-atom 

prr{a{t) ^ y \^ B) = V- 

as a pr-atom indexing r. We will refer to B as the body of the pr-atom. We will refer to v as the probability 
assigned by the pr-atom. 

Let Wi and W2 be possible worlds of 11 satisfying K. We say that Wi and W2 are probabilistically equivalent 
with respect to r if 

1. for all y, p{y) £ Wi if and only if p{y) G W2, and 

2. For every pr-atom q indexing r, W\ satisfies the body of q if and only if W2 satisfies the body of q. 

A scenario for r is an equivalence class of possible worlds of 11 satisfying K, under probabilistic equivalence with 
respect to r. 

Example 28 

[Rat Example Revisited] 

Consider the program from Example [TSl involving the rat, and its possible worlds Wi, W2, W3, W4. All four 
possible worlds are probabilistically equivalent with respect to Rule [1]. With respect to Rule [2] Wi is equivalent 
to W2, and W3 is equivalent to W4. Hence Rule [2] has two scenarios, { Wi, W2} and { W3, W4}. □ 
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range{a{t), r, s) will denote the set of possible values of a{t) in the possible worlds belonging to scenario s of 
rule r. This is well defined by (1) of the definition of probabihstic equivalence w.r.t. r. For example, in the rat 
program, range{death, 2, {Wi, W2}) — {true, false}. 

Let s be a scenario of rule r. A pr-atom q indexing r is said to be active in s if every possible world of s satisfies 
the body of q. 

For a random selection rule r and scenario s of r, let atr{s) denote the set of probability atoms which are active 
in s. For example, at2{{ Wi, W2}) is the singleton set {pr{death \c arsenic) = 0-8}. 



Definition 14 
[Unitary Rule] 

Rule r is unitary in 11, or simply unitary, if for every scenario s of r, one of the following conditions holds: 

1. For every y in range{a{t), r, s), atr{s) contains a pr-atom of the form prr{a{t) — y \c B) — v, and 
moreover the sum of the values of the probabilities assigned by members of atr{s) is 1; or 

2. There is a y in range{a{t), r, s) such that atr{s) contains no pr-atom of the form prr{a{t) — y \c B) = v, 
and the sum of the probabilities assigned by the members of atr{s) is less than or equal to 1. □ 



Definition 15 
[Unitary Program] 

A P-log program is unitary if each of its random selection rules is unitary. □ 



Example 29 

[Rat Example Revisited] 

Consider again Example [18] involving the rat. There is clearly only one scenario, si, for the Rule 
[ 1 ] random{arsenic), which consists of all possible worlds of the program, ati(si) consists of the single pr- 
atom pr{arsenic) =0-4. Hence the scenario satisfies Condition 2 of the definition of unitary. 

We next consider the selection rule [ 2 ]random (death)- There are two scenarios for this rule: Sarsenic, consisting 
of possible worlds satisfying arsenic, and its complement Snoarsenic- Condition 2 of the definition of unitary is 
satisfied for each element of the partition. □ 

We are now ready to state the main theorem of this section, the proof of which will be given in Appendix I. 



Theorem 1 

[Sufficient Conditions for Coherency] 

Every causally ordered, unitary P-log program is coherent. □ 

Using the above examples one can easily check that the rat, Monty Hall, and Simpson's examples are causally 
ordered and unitary, and therefore coherent. 

For the final result of this section, we give a result that P-log can represent the probability distribution of any finite 
set of random variables each taking finitely many values in a classical probability space. 
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Theorem 2 

[Embedding Probability Distributions in P-log] 

Let xi , . . . , a;„ be a nonempty vector of random variables, under a classical probability P, taking finitely many 
values each. Let Ri be the set of possible values of each Xi, and assume Ri is nonempty for each i. Then there 
exists a coherent P-log program 11 with random attributes xi , . . . , a;„ such that for every vector ri , . . . , r„ from 
Ri X • • X i?„ , we have 

P{xi = ri, . . . , a;„ = r„) = Pn{xi = ri, . . . , x„ = r„) (20) 

□ 

The proof of this theorem appears in Appendix I. It is a corollary of this theorem that if 5 is a finite Bayesian 
network, each of whose nodes is associated with a random variable taking finitely many possible values, then there 
is a P-log program which represents the same probability distribution as B. This by itself is not surprising, and 
could be shown trivially by considering a single random attribute whose values range over possible states of a 
given Bayes net. Our proof, however, shows something more - namely, that the construction of the P-log program 
corresponds straightforwardly to the graphical structure of the network, along with the conditional densities of its 
variables given their parents in the network. Hence any Bayes net can be represented by a P-log program which is 
"syntactically isomorphic" to the network, and preserves the intuitions present in the network representation. 



7 Relation with other work 

As we mention in the first sentence of this paper, the motivation behind developing P-log is to have a knowledge 
representation language that allows natural and elaboration tolerant representation of common-sense knowledge 
involving logic and probabilities. While some of the other probabilistic logic programming languages such as 
dPoole 1993llPoole 2000l l and jVennekens et al. 2004MVennekens 2007l l have similar goals, many other probabilis- 
tic logic programming languages have "statistical relational learning (SRL)" jGetoor et al. 2007"] l as one of their 
main goals and as a result they perhaps consciously sacrifice on the knowledge representation dimensions. In this 
section we describe the approaches in ( Poole I993tlPoole 20001 ) and dVennekens et al. 2004IIVennekens 2007 1) and 
compare them with P-log. We also survey many other works on probabilistic logic programming, including the 
ones that have SRL as one of their main goals, and relate them to P-log from the perspective of representation and 
reasoning. 



7.1 Relation with Poole's work 

Our approach in this paper has a lot of similarity (and many differences) with the works of Poole jPoole I993I 
IPoole 20001) . To give a somewhat detailed comparison, we start with some of the definitions from ( IPoole 19931) . 



7.1.1 Overview of Poole 's probabilistic Horn abduction 

In Poole's probabilistic Horn abduction (PHA), disjoint declarations are an important component. We start with 
their definition. (In our adaptation of the original definitions we consider the grounding of the theory, so as to make 
it simpler.) 



Definition 16 
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Disjoint declarations are of the form disjoint{[hi : pi ; ... ; /i„ : Pn]), where hiS are different ground atoms - 
refeiTed to as hypotheses or assumables, piS are real numbers and pi + . . . + p„ = 1. □ 

We now define a PHA theory. 

Definition 17 

A probabilistic Horn abduction (PHA) theory is a collection of definite clauses and disjoint declarations such that 
no atom occurs in two disjoint declarations. □ 

Given a PHA theory T, the facts of T, denoted by Ft consists of 

• the collection of definite clauses in T, and 

• for every disjoint declarations D in T, and for every hi and hj, i j in D, integrity constraints of the form: 

hi , hj . 

The hypotheses of T, denoted by Ht, is the set of hi occurring in disjoint declarations of T. 

The prior probability of T is denoted by Pt and is a function Ht [0, 1] defined such that PT{hi) — pi 
whenever hi : pi is in a disjoint declaration of T. Based on this prior probability and the assumption, denoted by 
(Hyp-independent), that hypotheses that are consistent with Ft are (probabilistically) independent of each other, 
we have the following definition of the joint probability of a set of hypotheses. 

Definition 18 

Let {hi, . . . , /ifc} be a set of hypotheses where each hi is from a disjoint declaration. Then, their joint probability 
is givenby X . . . X /'^(/ife). □ 

Poole jPoole 1993l l makes the following additional assumptions about Ft and Ht- 

1 . (Hyp-not-head) There are no rules in Ft whose head is a member of Ht- (i.e., hypotheses do not appear in 
the head of rules.) 

2. (Acyclic-definite) Ft is acyclic. 

3. (Completion-cond) The semantics of Ft is given via its Clark's completion. 

4. (Body-not-overlap) The bodies of the rules in Ft for an atom are mutually exclusive, (i.e., if we have 
a ^ Bi and a ^ Bj in Ft, where i ^ j, then Bi and Bj can not be true at the same time.) 

Poole presents his rationale behind the above assumptions, which he says makes the language weak. His rationale 
is based on his goal to develop a simple extension of I'ure Prolog (definite logic programs) with Clark's completion 
based semantics, that allows interpreting the number in the hypotheses as probabilities. Thus he restricts the syntax 
to disallow any case that might make the above mentioned interpretation difficult. 

We now define the notions of explanations and minimal explanations and use it to define the probability distribution 
and conditional probabihties embedded in a PHA theory. 

Definition 19 

If (7 is a formula, an explanation of g from {Ft, Ht) is a subset D of Ht such that Ft D \^ g and Ft U D has 
a model. 



A minimal explanation of g is an explanation of g such that no strict subset is an explanation of g 



□ 
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Poole proves that under the above mentioned assumptions, if min-expl{g, T) is the set of all minimal explanations 
of g from {Ft , Ht) and Comp{T) is the Clark's completion of Ft then 

Comp{T) ^ {g= V 

Ci G min-expl{g,T) 

Definition 20 

For a formula g, its probability P with respect to a PHA theory T is defined as: 



P{g) = Yl PT{e,) 

ei G min-expl{g,T) 

□ 



Conditional probabilities are defined using the standard definition: 



PfaW) = 



We now relate his work with ours. 



7. 7.2 Poole 's PHA compared with P-log 

• The disjoint declarations in PHA have some similarity with our random declarations. Following are some of 
the main differences: 

— (Disjl) The disjoint declarations assign probabilities to the hypothesis in that declaration. We use 
probability atoms to specify probabilities, and our random declarations do not mention probabilities. 

— (Disj2) Our random declarations have conditions. We also specify a range for the attributes. Both the 
conditions and attributes use predicates that are defined using rules. The usefulness of this is evident 
from the formulation of the Monty Hall problem where we use the random declaration 
random{open : {X : can-open{X)}). 

The disjoint declarations of PHA theories do not have conditions and they do not specify ranges. 

— (Disj3) While the hypotheses in disjoint declarations are arbitrary atoms, our random declarations are 

about attributes. 

• (Pr-atom-gen) Our specification of the probabilities using pr-atoms is more general than the probability 
specified using disjoint declarations. For example, in specifying the probabilities of the dices we say: 
pr{roll{D) = Y \c owner{D) = john) = 1/6. 

• (CBN) We directly specify the conditional probabilities in causal Bayes nets, while in PHA only prior 
probabilities are specified. Thus expressing a Bayes network is straightforward in P-log while in PHA it 
would necessitate a transformation. 

• (Body-not-overlap2) Since Poole's PHA assumes that the definite rules with the same hypothesis in the head 
have bodies that can not be true at the same time, many rules that can be directly written in our formalism 
need to be transformed so as to satisfy the above mentioned condition on their bodies. 

• (Gen) While Poole makes many a-priori restrictions on his rules, we follow the opposite approach and ini- 
tially do not make any restrictions on our logical part. Thus we have an unrestricted logical knowledge 
representation language (such as ASP or CR-Prolog) at our disposal. We define a semantic notion of consis- 
tent P-log programs and give sufficiency conditions, more general than Poole's restrictions, that guarantee 
consistency. 
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• (Obs-do) Unlike us, Poole does not distinguish between doing and observing. 

• (Gen-upd) We consider very general updates, beyond an observation of a propositional fact or an action that 
makes a propositional fact true. 

• (Prob-def) Not all probability numbers need be explicitly given in P-log. It has a default mechanism to 
implicitly assume certain probabilities that are not explicitly given. This often makes the representation 
simpler 

• Our probability calculation is based on possible worlds, which is not the case in PHA, although Poole's later 
formulation of Independent Choice Logic jPoole 1997IIPoole 2000l l (ICL) uses possible worlds. 

7.1.3 Poole 's ICL compared with P-log 

Poole's Independent Choice Logic dPoole 19971 IPoole 2000l l refines his PHA by replacing the set of disjoint dec- 
larations by a choice space (where individual disjoint declarations are replaced by alternatives, and a hypothesis 
in an individual disjoint declaration is replaced by an atomic choice), by replacing definite programs and their 
Clark's completion semantics by acyclic normal logic programs and their stable model semantics, by enumerating 
the atomic choices across alternatives and defining possible worldfl rather than using minimal explanation based 
abduction, and in the process making fewer assumptions. In particular, the assumption Completion-cond is no 
longer there, the assumption Body-not-overlap is only made in the context of being able to obtain the probability 
of a formula g by adding the probabilities of its explanations, and the assumption Acyclic-definite is relaxed to 
allow acyclic normal programs; while the assumptions Hyp-not-head and Hyp-independent remain in slightly 
modified form by referring to atomic choices across alternatives rather than hypothesis across disjoint statements. 
Nevertheless, most of the differences between PHA and P-log carry over to the differences between ICL and P-log. 
In particular, all the differences mentioned in the previous section - with the exception of Body-not-overlap2 - 
remain, modulo the change between the notion of hypothesis in PHA to the notion of atomic choices in ICL. 

7.2 LPAD : Logic programming with annotated disjunctions 

In recent work (I Vennekens et al. 20041) Vennekens et al. have proposed the LPAD formalism. An LPAD program 
consists of rules of the form: 

(/ii : ai) V . . . V {hn : a„) ^ 6i, . . . , 

where /i^'s are atoms, biS are atoms or atoms preceded by not, and aiS are real numbers in the interval [0, 1], such 
that J2i=i Oil = 1- 

An LPAD rule instance is of the form: 

hi ^ &i , . . . , &m ■ 

The associated probability of the above rule instance is then said to be a^. 

An instance of an LPAD program P is a (normal logic program) P' obtained as follows: for each rule in P exactly 
one of its instance is included in P' , and nothing else is in P' . The associated probability of an instance P' , denoted 
by 7r(P'), of an LPAD program is the product of the associated probability of each of its rules. 

An LPAD program is said to be sound if each of its instances has a 2-valued well-founded model. Given an LPAD 
program P, and a collection of atoms /, the probability assigned to / by P is given as follows: 

^ Poole's possible worlds are very similar to ours except that he explicitly assumes that the possible worlds whose core would be obtained by 
the enumeration, can not be ehminated by the acyclic programs thi'ough constraints. We do not make such an assumption, allow elimination of 
such cores, and if elimination of one or more (but not all) possible worlds happen then we use normalization to redistribute the probabilities. 
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MI) = E ^(^') 

p' is an instance of p and / is the well-founded model of P' 
The probability of a formula (j) assigned by an LPAD program P is then defined as: 

ttpW = E MI) 

4> is satisfied by / 

7.2.1 Relating LPAD with P-log 

LPAD is richer in syntax than PHA or ICL in that its rules (corresponding to disjoint declarations in PHA and 
a choice space in ICL) may have conditions. In that sense it is closer to the random declarations in P-log. Thus, 
unlike PHA and ICLP, and similar to P-log, Bayes networks can be expressed in LPAD fairly directly. Nevertheless 
LPAD has some significant differences with P-log, including the following: 

• The goal of LPAD is to provide succinct representations for probability distributions. Our goals are broader, 
viz, to combine probabilistic and logical reasoning. Consequently P-log is logically more expressive, for 
example containing classical negation and the ability to represent defaults. 

• The ranges of random selections in LPAD are taken directly from the heads of rules, and are therefore static. 
The ranges of of selections in P-log are dynamic in the sense that they may be different in different possible 
worlds. For example, consider the representation 

random{open : {X : can -open (X)}). 

of the Monty Hall problem. It is not clear how the above can be succinctly expressed in LPAD. 



7.3 Bayesian logic programming: 



A Bayesian logic program (BLP) ( Kersting and De Raedt 2007 1 has two parts, a logical part and a set of conditional 
probability tables. The logical part of the BLP consists of clauses (referred to as BLP clauses) of the form: 

H\Ai,...,A^ 

where 77, ^i, . . . , An are (Bayesian) atoms which can take a value from a given domain associated with the atom. 
Following is an example of a BLP clause from ( Kersting and De Raedt 200"7) l: 

hurglary{X) \ neighborhood {X). 

Its corresponding domain could be, for example, Diurgiary = {yes, no}, and D neighbourhood = 
{had, average, good}. 

Each BLP clause has an associated conditional probability table (CPT). For example, the above clause may have 
the following table: 



neighborhood(X) 



burglary(X) 
yes 



burglary(X) 



bad 
average 
good 



0.6 
0.4 
0.3 



0.4 
0.6 
0.7 
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A ground BLP clause is similar to a ground logic programming rule. It is obtained by substituting variables with 
ground terms from the Herbrand universe. If the ground version of a BLP program is acyclic, then a BLP can 
be considered as representing a Bayes network with possibly infinite number of nodes. To deal with the situation 
when the ground version of a BLP has multiple rules with the same atom in the head, the formalisms allows for 
specification of combining rules that specify how a set of ground BLP rules (with the same ground atom in the 
head) and their CPT can be combined to a single BLP rule and a single associated CPT. 

The semantics of an acyclic BLP is thus given by the characterization of the corresponding Bayes net obtained as 
described above. 



7.3.1 Relating BLPs with P-log 

The aim of BLPs is to enhance Bayes nets so as to overcome some of the limitations of Bayes nets such as 
difficulties with representing relations. On the other hand like Bayes nets, BLPs are also concerned about statistical 
relational learning. Hence the BLP research is less concerned with general knowledge representation than P-log is, 
and this is the source of most of the differences in the two approaches. Among the resulting differences between 
BLP and P-log are: 

• In BLP every ground atoms represents a random variable. This is not the case in P-log. 

• In BLP the values the atoms can take are fixed by their domain. This is not the case in P-log where through 
the random declarations an attribute can have different domains under different conditions. 

• Although the logical part of a BLP looks like a logic program (when one replaces | by the connective 
^), its meaning is different from the meaning of the corresponding logic program. Each BLP clause is a 
compact representation of multiple logical relationships with associated probabilities that are given using a 
conditional probability table. 

• In BLP one can specify a combining rule. We do not allow such specification. 

The ALTERID language of jBreese I990IIWellman et al. 1 9921 ) is similar to BLPs and has similar differences with 
P-log. 



7.3.2 Probabilistic knowledge bases 

Bayesian logic programs mentioned in the previous subsections was inspired by the probabilistic knowledge bases 
(PKBs) of ( Ngo and Haddawy I997[ l. We now give a brief description of this formalism. 



In this formalism each predicate represents a set of similar random variables. It is assumed that each predicate 
has at least one attribute representing the value of random attributes made up of that predicate. For example, the 
random variable Colour of a car C can be represented by a 2-ary predicate color {C, Col), where the first position 
takes the id of particular car, and the second indicates the color (say, blue, red, etc.) of the car C. 

A probabilistic knowledge base consists of three parts: 

• A set of probabilistic sentences of the form: 

pr{Ao \ Al, . . . , An) = a, where AiS are atoms. 

• A set of value integrity constraints of the form: 

EXCLUSIVE{p, ai, . . . , a„), where p is a predicate, and a^s are values that can be taken by random vari- 
ables made up of that predicate. 

• A set of combining rules. 



Probabilistic reasoning with answer sets 



51 



The combining rules serve similar purpose as in Bayesian logic programs. Note that unlike Bayesian logic pro- 
grams that have CPTs for each BLP clause, the probabilistic sentences in PKBs only have a single probability 
associated with it. Thus the semantic characterization is much more complicated. Nevertheless the differences 
between P-log and Bayesian logic programs also carry over to PKBs. 



7.4 Stochastic logic programs 



A Stochastic logic program (SLP) (Muggleton 1995 1 P is a collection of clauses of the form 
p : A^Bi,...,Bn 



where p (referred to as the probability label) belongs to [0, 1], and ^, J9i, . . . _B„ are atoms, with the requirements 
that (a) j4 ^ Bi , . . . , _B„ is range restricted and (b) for each predicate symbol q in P, the probability labels for all 
clauses with q in the head sum to 1 . 

The probability of an atom g with respect to an SLP P is obtained by summing the probability of the various 
SLD-refutation of ^ g with respect to P, where the probability of a refutation is computed by multiplying the 
probability of various choices; and doing appropriate normalization. For example, if the first atom of a subgoal 
<— g' unifies with the head of stochastic clauses pi : Ci, . . ., Pm ■ Cm, and the stochastic clause pi : C-i is 
chosen for the refutation, then the probability of this choice is ^^^^'^^ ■ 



7.4.1 Relating SLPs with P-log 

SLPs, both as defined in the previous section and as in jCussens 1999l l. are very different from P-log both in its 
syntax and semantics. 

• To start with, SLPs do not allow the 'not' operator, thus limiting the expressiveness of the logical part. 

• In SLPs all ground atoms represent random variables. This is not the case in P-log. 

• In SLPs probability computation is through computing probabilities of refutations, a top down approach. In 
P-log it is based on the possible worlds, a bottom up approach. 

The above differences also carry 

over to probabilistic constraint logic programs jRiezler 19981 ISantos Costa et al. 2003t that generalize SLPs to 
Constraint logic programs (CLPs). 



7.5 Probabilistic logic programming 



The probabilistic logic programming formaUsms in (Ng and Subrahmanian 1992 Ng and Subrahmanian 1994 



Dekhtyar and Dekhtyar 2004 1 and dLukasiewicz 1998l l take the representation of uncertainty to another level. In 
these two approaches they are interested in classes of probability distributions and define inference methods for 
checking if certain probability statements are true with respect to all the probability distributions under considera- 
tion. To express classes of probability distributions, they use intervals where the intuitive meaning of p : [a, /3] is 



that the probabiUty of p is in between a and /3. We now discuss the two formalisms in (Ng and Subrahmanian 1992 



Ng and Subrahmanian 1994 Dekhtyar and Dekhtyar 2004) and dLukasiewicz 1998] l in further detail. We refer to 



the first one as NS-PLP (short for Ng- Subrahmanian probabilistic logic programming) and the second one as L-PLP 
(short for Lukasiewicz probabilistic logic programming). 
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7.5.1 NS-PLP 

A simple NS-PLP program 



( Ng and Subrahmanian 1992 Ng and Subrahmanian 1994 Dekhtyar and Dekhtyar 2004 1 is a finite collection of 



p-clauses of the form 

^0 : [ao,/3o] <— : [ai, Pi], ■ ■ . , An : [a„,/?„]. 

where Aq, Ai, . . . , An are atoms, and [ai, Pi] C [0, 1]. Intuitively, the meaning of the above rule is that if the 
probability of Ai is in the interval [ai,/?i], and the probability of An is in the interval [an,/3n] then the 
probability of is in the interval [ao: Po]- 

The goal behind the semantic characterization of an NS-PLP program P is to obtain and express the set of (prob- 
abilistic) p-interpretations (each of which maps possible worlds, which are subsets of the Herbrand Base, to a 
number in [0,1]), Mod{P), that satisfy all the p-clauses in the program. Although initially it was thought that 
Mod{P) could be computed through the iteration of a fixpoint operator, recently (Dekhtyar and Dekhtyar 2004} 
shows that this is not the case and gives a more complicated way to compute Mod{P). In particular, 
( Dekhtyar and Dekhtyar 2004| shows that for many NS-PLP programs, although its fixpoint, a mapping from the 



Herbrand base to an interval in [0, 1], is defined, it does not represent the set of satisfying p-interpretations. 



Ng and Subrahmanian (Ng and Subrahmanian 1994 1 consider more general NS-PLP programs where ^^s are 'ba- 
sic formulas' (which are conjunction or disjunction of atoms) and some of ^i, . . . , An are preceded by the not 
operator In presence of not they give a semantics inspired by the stable model semantics. But in this case an 
NS-PLP program may have multiple stable formula functions, each of which map formulas to intervals in [0, 1]. 
While a single stable formula function can be considered as a representation of a set of p-interpretations, it is not 
clear what a set of stable formula functions correspond to. Thus NS-PLP programs and their characterization is 
very different from P-log and it is not clear if one is more expressive than the other. 



7.5.2 L-PLP 

An L-PLP program (ILukasiewicz 1998l l is a finite set of L-PLP clauses of the form 

{H I S)[ci,C2] 

where H and B are conjunctive formulas and ci < C2. 

Given a probability distribution Pr, an L-PLP clause of the above form is said to be in _Pr if ci < Pr{H\B) < ci. 
Pr is said to be a model of an L-PLP program tt if each clause in tt is true in Pr. [H \ B)[ci, C2] is said to 
be a logical consequence of an L-PLP program tt denoted by tt ^ {H \ 5)[ci, C2] if for all models Pr of tt, 
{H I -B)[ci, C2] is in Pr. A notion of tight entailment, and correct answer to ground and non-ground queries of 
the form 3{H \ B)[ci, C2] is then defined in ( ILukasiewicz 19981 ). In recent papers Lukasiewicz and his colleagues 
generalize L-PLPs in several ways and define many other notions of entailment. 

In relation to NS-PLP programs, L-PLP programs have a single interval associated with an L-PLP clause and 
an L-PLP clause can be thought of as a constraint on the corresponding conditional probability. Thus, although 
'logic' is used in L-PLP programs and their characterization, it is not clear whether any of the 'logical knowledge 
representation' benefits are present in L-PLP programs. For example, it does not seem that one can define the 
values that a random variable can take, in a particular possible world, using an L-PLP program. 
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7.6 PRISM: Logic programs with distribution semantics 

Sato in dSato 1993] l proposes the notion of "logic programs with distribution semantics," which he refers to as 
PRISM as a short form for "PRogramming In Statistical Modeling." Sato starts with a possibly infinite collection 
of ground atoms, F, the set ilj? of all interpretations of and a completely additive probability measure Pp 
which quantifies the likelihood of interpretations. Pp is defined on some fixed a algebra of subsets of ftp. 

In Sato's framework interpretations of F can be used in conjunction with a Horn logic program R, which contains 
no rules whose heads unify with atoms from F. Sato's logic program is a triple, 11 — {F, Pp, R)- The semantics of 
n are given by a collection fin of possible worlds and the probability measure Pn- A set M of ground atoms in the 
language of 11 belongs to 51n iff is a minimal Herbrand model of a logic program IpUR for some interpretation 
Ip of F. The completely additive probability measure of Pn is defined as an extension of Pp. 

Given a specification of Pp, the formalism provides a powerful tool for defining complex probability measures, 
including those which can be described by Bayesian nets and Hidden Markov models. The emphasis of the original 
work by Sato and other PRISM related research seems to be on the use of the formalism for design and investigation 
of efficient algorithms for statistical learning. The goal is to use the pair DB — {F, R) together with observations 
of atoms from the language of DB to learn a suitable probability measure Pp. 

P-log and PRISM share a substantial number of common features. Both are declarative languages capable of 
representing and reasoning with logical and probabilistic knowledge. In both cases logical part of the language is 
rooted in logic programming. There are also substantial differences. PRISM seems to be primarily intended as "a 
powerful tool for building complex statistical models" with emphasis of using these models for statistical learning. 
As a result PRISM allows infinite possible worlds, and has the ability of learning statistical parameters embedded 
in its inference mechanism. The goal of P-log designers was to develop a knowledge representation language 
allowing natural, elaboration tolerant representation of commonsense knowledge involving logic and probabilities. 
Infinite possible worlds and algorithms for statistical learning were not a priority. Instead the emphasis was on 
greater logical power provided by Answer Set Prolog, on causal interpretation of probability, and on the ability to 
perform and differentiate between various types of updates. In the near future we plan to use the PRISM ideas to 
expand the semantics of P-log to allow infinite possible worlds. Our more distant plans include investigation of 
possible adaptation of PRISM statistical learning algorithms to P-log. 



7.7 Other approaches 

So far we have discussed logic programming approaches to integrate logical and probabilistic reasoning. Besides 
them, the paper jDe Vos and Vermeir 2000l l proposes a notion where the theory has two parts, a logic programming 
part that can express preferences and a joint probability distribution. The probabilities are then used in determining 
the priorities of the alternatives. 

Besides the logic programming based approaches, there have been other approaches to combine logical and 
probabilistic reasoning, such as probabilistic relational models dKoller 19991 IGetoor et al. 200 il l, various proba- 
biUstic first-order logics such as (INilsson 1986IIBacchus 1990IIBacchus et al. 1996l|Halpern 1990||Halpern 2003 



IPasula an d Russell 20011 IPoole 1993] l. approaches that assign a weight to first-order formulas (IPaskin 20021 



[Richardson and Domingos 2006 1 and first-order MDPs (IBoutilier et al. 200Tl l. In all these approaches the logic 



parts are not quite rich from the 'knowledge representation' angle. To start with they use classical logic, which is 
monotonic and hence has many drawbacks with respect to knowledge representation. A difference between first- 
order MDPs and our approach is that actions, rewards and utilities are inherent part of the former; one may encode 



By interpretation Ip of F we mean an arbitrary subset of F. Atom A G F is true in Ip iff A Ip . 
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them in P-log though. In the next subsection we summarize specific differences between these approaches (and all 
the other approaches that we mentioned so far) and P-log. 



7.8 Summary 

In summary, our focus in P-log has many broad differences with most of the earlier formalisms that have tried to 
integrate logical and probabilistic knowledge. We now list some of the main issues. 

• To the best of our knowledge P-log is the only probabilistic logic programming language which differentiates 
between doing and observing, which is useful for reasoning about causal relations. 

• P-log allows a relatively wide variety of updates compared with other approaches we surveyed. 

• Only P-log allows logical reasoning to dynamically decide on the range of values that a random variable can 
take. 

• P-log is the only language surveyed which allows a programmer to write a program which represent the 
logical aspects of a problem and its possible worlds, and add causal probabilistic information to this program 
as it becomes relevant and available. 

• Our formalism allows the expUcit specification of background knowledge and thus eliminates the difference 



between implicit and explicit background knowledge that is pointed out in (Wang 2004 1 while discussing 
the limitation of Bayesianism. 

As our formalization of the Monty Hall example shows, P-log can deal with non-trivial conditioning and is 
able to encode the notion of protocols mentioned in Chapter 6 of ( [Halpern 2003| l. 



8 Conclusion and Future Work 

In this paper we presented a non-monotonic probabilistic logic programming language, P-log, suitable for repre- 
senting logical and probabilistic knowledge. P-log is based on logic programming under answer set semantics, and 
on Causal Bayesian networks. We showed that it generalizes both languages. 

P-log comes with a natural mechanism for belief updating — the ability of the agent to change degrees of belief 
defined by his current knowledge base. We showed that conditioning of classical probability is a special case of this 
mechanism. In addition, P-log programs can be updated by actions, defaults and other logic programming rules, 
and by some forms of probabilistic information. The non-monotonicity of P-log allows us to model situations when 
new information forces the reasoner to change its collection of possible worlds, i.e. to move to a new probabilistic 
model of the domain. (This happens for instance when the agent's knowledge is updated by observation of an event 
deemed to be impossible under the current assumptions.) 

The expressive power of P-log and its ability to combine various forms of reasoning was demonstrated on a number 
of examples from the literature. The presentation of the examples is aimed to give a reader some feeling for the 
methodology of representing knowledge in P-log. Finally the paper gives sufficiency conditions for coherency of 
P-log programs and discusses the relationship of P-log with a number of other probabilistic logic programming 
formalisms. 

We plan to expand our work in several directions. First we need to improve the efficiency of the P-log inference 
engine. The current, naive, implementation relies on computation of all answer sets of the logical part of P-log 
program. Even though it can efficiently reason with a surprising variety of interesting examples and puzzles, a more 
efficient approach is needed to attack some other kinds of problems. We also would like to investigate the impact of 
replacing Answer Set Prolog — the current logical foundation of P-log — by a more powerful logic programming 
language, CR-prolog. The new extension of P-log will be able to deal with updates which are currently viewed as 
inconsistent. We plan to use P-log as a tool for the investigation of various forms of reasoning, including reasoning 
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with counterfactuals and probabilistic abductive reasoning capable of discovering most probable explanations of 
unexpected observations. Finally, we plan to explore how statistical relational learning (SRL) can be done with 
respect to P-log and how P-log can be used to accommodate different kinds of uncertainties tackled by existing 
SRL approaches. 
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9 Appendix I: Proofs of major theorems 

Our first goal in this section is to prove Theorem [T] from Section |6] We'll begin by proving a theorem which is 
more general but whose hypothesis is more difficult to verify. In order to state and prove this general theorem, we 
need some terminology and lemmas. 

Definition 21 

Let T be a tree in which every arc is labeled with a real number in [0,1]. We say T is unitary if the labels of the 
arcs leaving each node add up to L □ 

Figure[T]gives an example of a unitary tree. 




Fig. L Unitary tree T 
Definition 22 

Let T be a tree with labeled nodes and k be a node of T. By prin) we denote the set of labels of nodes lying on 
the path from the root of T to n, including the label of n and the label of the root. □ 



Example 30 

Consider the tree T from Figure[T] If n is the node labeled (13), then prin) = {1, 3, 8, 13}. 



□ 
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Definition 23 
[Path Value] 

Let T be a tree in which every arc is labeled with a number in [0,1]. The path value of a node n of T, denoted by 
pvT{n), is defined as the product of the labels of the arcs in the path to n from the root. (Note that the path value 
of the root of T is 1.) □ 

When the tree T is obvious from the context we will simply right pv{n). 

Example 31 

Consider the tree T from Figure [T] If n is the node labeled (8), then pv{n) = 0- 3x0-3 = 0- 09. □ 
Lemma 1 

[Property of Unitary Trees] 

Let T be a unitary tree and n be a node of T. Then the sum of the path values of all the leaf nodes descended from 
n (including k if n is a leaf) is the path value of n. □ 

Proof: We will prove that the conclusion holds for every unitary subtree of T containing n, by induction on the 
number of nodes descended from n. Since T" is a subtree of itself, the lemma will follow. 

If n has only one node descended from it (including n itself if n is a leaf) then n is a leaf and then the conclusion 
holds trivially. 

Consider a subtree S in which n has k nodes descended from it for some k > Q, and suppose the conclusion is true 
for all subtrees where n has less than k descendents. Let / be a leaf node descended from n and let p be its parent. 
Let 5" be the subtree of S consisting of all of 5" except the children of p. By induction hypothesis, the conclusion 
is true of S' . Let ci, . . . , c„ be the children of p. The sum of the path values of leaves descended from n in 5* is 
the same as that in S", except that pv{p) is replaced by pw(ci) + . . . + pw(c„). Hence, we will be done if we can 
show these are equal. 

Let /i, ■ • •, /„ be the labels of the arcs leading to nodes Ci, ••, c„ respectively. Then pv{ci) + . . . + pv{cn) = 
li * pv{p) + . . . + In * pv{p) by definition of path value. Factoring out pv{p) gives pv{p) * (^i + . . . + In). But 
Since S' is unitary, + . . . + Z„ = 1 and so this is just pv(p). □ 

Let n be a P-log program with signature E. Recall that r(n) denotes the translation of its logical part into an 
Answer Set Prolog program. Similarly for a literal I (in E) with respect to 11, t{1) will represent the corresponding 
literal in r(n). For example, T(o'u;ner(di) = mike) = owner ( di, mifce). For a set of literals _B (in E) with respect 
to n, t{B) will represent the set {r(/) | I e B}. 

Definition 24 

A set 5* of literals of 11 is H-compatible with a literal Z of E if there exists an answer set of T(n) containing 
t{S) U {t{1)}. Otherwise S is H- incompatible with I. S is H-compatible with a set B of literals of 11 if there exists 
an answer set of r(n) containing t{S) U t{B); otherwise 5* is H- incompatible with B. □ 

Definition 25 

A set S of literals is said to li-guarantee a literal / if S and I are Il-compatible and every answer set of T(n) 
containing t{S) also contains t(/); S \l- guarantees a set B of Uterals if S Il-guarantees every member of B. □ 
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Definition 26 

We say that -B is a potential Il-cause of a{t) = y with respect to a rule r if 11 contains rules of the form 

[r] random{a{t) : {X : p{X)}) ^ K ■ (21) 

and 

prr{a{t) = y\cB) = v (22) 

□ 

Definition 27 
[Ready to branch] 

Let T be a tree whose nodes are labeled with literals and r be a rule of 11 of the form 

random{a{t) : {X : p{X)}) ^ K- 

or 

random{a{t)) <— K- 

where K can be empty. A node n of T is ready to branch on a{t) via r relative to H if 

1. prin) contains no literal of the form a{t) = y for any y, 

2. pxin) n-guarantees K, 

3. for every rule of the form prr{a{t) = y \c B) = v inU, either prin) Il-guarantees B or is Il-incompatible 
with B, and 

4. if r is of the first form then for every y in the range of a{t), prin) either Il-guarantees p{y) or is 11- 
incompatible with p{y) and moreover there is at least one y such that pt («) Il-guarantees p{y). 

If n is obvious from context we may simply say that n is ready to branch on a{t) via r. □ 
Proposition 5 

Suppose n is ready to branch on a(i) via some rule r of 11, and a(t) = y is Il-compatible with pj'(n); and let Wi 
and 11^2 be possible worlds of n compatible with Then P( Wi, a(f) = y) = P{W2,a{t) = y). □ 

Proof: Suppose n is ready to branch on a{t) via some rule r of H, and a{t) = y is Il-compatible with priji); and 
let Wi and W2 be possible worlds of 11 compatible with pxin). 

Case 1: Suppose a{t) = y has an assigned probability in Wi. Then there is a rule prr{a{t) — y \ B) — v of 
n such that Wi satisfies B. Since Wi also satisfies prin), B is Il-compatible with prin). It follows from the 
definition of ready-to-branch that prin) Il-guarantees B. Since W2 satisfies prin) it must also satisfy B and so 
P{W2,ait) = y) = v. 

Case 2: Suppose a{t) = y does not have an assigned probability in Wi. Case 1 shows that the assigned prob- 
abilities for values of a{t) in Wi and W2 are precisely the same; so a{t) = y has a default probability in both 
worlds. We need only show that the possible values of a{t) are the same in Wi and W2. Suppose then that for 
some^, a{t) = ^ is possible in VFi.Then Wi satisfies ^(y). Hence since Wi satisfies j9t(«). we have that pt('^) 
is Il-compatible with p{y). By definition of ready-to-branch, it follows that prin) Il-guarantees p{y). Now since 
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W2 satisfies prin) it must also satisfy p{y) and hence a{t) = y is possible in W2- The other direction is the same. 

□ 

Suppose n is ready to branch on a{t) via some rule r of 11, and a{t) ^ y is Il-compatible with prin), and W 
is a possible world of 11 compatible prin). We may refer to the P{ W, a{t) — y) as v{n, a{t), y). Though the 
latter notation does not mention W, it is well defined by proposition]!] 



i 




Fig. 2. T2: The tree corresponding to the dice P-log program 112 

Example 32 
[Ready to branch] 

Consider the following version of the dice example. Lets refer to it as 112 

dice = {di, d2}- 
score = {1,2, 3, 4, 5, 6}- 
person = {mike, john}- 
roll : dice score- 
owner : dice person- 
owner (di) = mike- 
owner{d2) = john- 

even{D) ^ roll{D) = Y,Y mod 2 = 0- 

-^even{D) ^ not even(D)- 

[ r{D) ] random{roll{D))- 

pr{roll{D) = Y \ c owner{D) = john) = 1/6- 
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pr{roll{D) = 6 |c owner (D) = mike) = 1/4. 
pr{roll{D) ^ Y \ c Y j^6, owner{D) = nuke) = 3/20. 
where D ranges over {di,d2]- 

Now consider a tree of Figure |2] Let us refer to the root of this tree as ni, the node roll{di) = 1 as ^2, 
and the node roll{d2) — 2 connected to n2 as 713. Then ^^^(ni) — {true^, pt2('^2) = {true, roll{di) — 1}, 
andpT^C'^s) = {true, roll (di) = l,roll{d2) = 2}. The set {true} of literals n2-guarantees {owner{di) = 
mike, owner{d2) — john} and is 112 -incompatible with {owner(di) — john, owner{d2) — mike}. Hence ni and 
the attribute roll{di) satisfy condition 3 of definition |27] Similarly for roll{d2). Other conditions of the definition 
hold vacuously and therefore rii is ready to branch on roll{D) via r{D) relative to 112 for D G {di, ^2}. It is also 
easy to see that 7*2 is ready to branch on roll{d2) via r{d2), and that 7*3 is not ready to branch on any attribute of 

n2. □ 



Definition 28 
[Expanding a node] 

In case n is ready to branch on a{t) via some rule of 11, the U-expansion of T at n by a{t) is a tree obtained 
from T as follows: for each y such that prin) is Il-compatible with a{t) = y, add an arc leaving n, labeled with 
v{n, a{t), y), and terminating in a node labeled with a{t) — y. We say that n branches on a{t). □ 



Definition 29 
[Expansions of a tree] 

A zero-step Il-expansion of T is T. A one-step H-expansion of T is an expansion of T at one of its leaves by 
some attribute term a{t). For 77. > 1, an n-step H-expansion of T is a one-step Il-expansion of an (77 — l)-step 
XT-expansion of T. A H-expansion of T is an 77-step Il-expansion of T for some non-negative integer 7i. □ 



For instance, the tree consisting of the top two layers of tree T2 from Figure|2]is a n2-expansion of one node tree 

Til by roll{di). 

Definition 30 

A seed is a tree with a single node labeled true. □ 



Definition 31 
[Tableau] 

A tableau of 11 is a Il-expansion of a seed which is maximal with respect to the subtree relation. □ 



For instance, a tree T2 of Figure |2]is a tableau of 112. 



Definition 32 

[Node Representing a Possible World] 

Suppose T is a tableau of 11. A possible world of 11 is represented by a leaf node 77 of T if is the set of 
literals Il-guaranteed hy pxin). □ 



For instance, a node 713 of T2 represents a possible world 

{owner{di, mike), owner {d2 , john) , roll{di, 1), roll{d2, 2), -ieven[di), even[d2)} . 
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Definition 33 

[Tree Representing a Program] 

If every possible world of 11 is represented by exactly one leaf node of T, and every leaf node of T represents 
exactly one possible world of 11, then we say T represents H. □ 



It is easy to check that the tree T2 represents 112. 



Definition 34 
[Probabilistic Soundness] 

Suppose n is a P-log program and T is a tableau representing 11, such that i? is a mapping from the possible worlds 
of n to the leaf nodes of T which represent them. If for every possible world of 11 we have 

PVt{R{W)) = n{W) 

i.e. the path value in T of R{ W) is equal to the probability of W, then we say that the representation of 11 by T 
is probabilistically sound. □ 



The following theorem gives conditions sufficient for the coherency of P-log programs (Recall that we only con- 
sider programs satisfying Conditions [T]|2l and[3]of Section lTSI i. It will later be shown that all unitary, ok programs 
satisfy the hypothesis of this theorem, establishing Theorem[T] 



Theorem 3 

[Coherency Condition] 

Suppose n is a consistent P-log program such that Pn is defined. Let 11' be obtained from 11 by removing aU obser- 
vations and actions. If there exists a unitary tableau T representing 11', and this representation is probabilistically 
sound, then for every pair of rules 

[r\ random{a{t) : {Y : p{ Y)}) ^ K ■ (23) 

and 

prr{a{t) ^ y \, B) ^ V ■ (24) 
of n' such that Pu'[B\jK) > we have 

Pn'\Jobs(B)'Jobs(K){o-{t) = y) = V 

Hence 11 is coherent. □ 



Proof: For any set S of literals, let lgar{S) (pronounced "L-gar" for "leaves guaranteeing") be the set of leaves n 
of T such that prin) Il'-guarantees S. 

Let ^ denote the measure on possible worlds induced by 11'. Let be the set of possible worlds of 11' U obs{B) U 
ohs{K). Since Pn'{B U ) > we have 

P (n(i\ - „\ - ^{ty ■■ A a{t) = y g W} f^i ^) 



Now, let 
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a — pv{n) 

n£lgar{BUKU{a{t) = y)} 
n£lgar(B\jK) 

Since T" is a probabilistically sound representation of 11', the right-hand side of (IZST i can be written as a//3. So we 
will be done if we can show that a/ j3 — v. 

We first claim 

Every n G lgar{B U K) has a unique ancestor ga{n) which branches on a{t) via r ( |23] l • (26) 

If existence failed for some leaf n then n would be ready to branch on a{t) which contradicts maximality of the 
tree. Uniqueness follows from Condition 1 of DefinitionlZTl 

Next, we claim the following: 

For every n G lgar{B U K), pT(ga{n)) H-guarantees B U K ■ (27) 

Let n G lgar{B U K). Since ga{n) branches on a{t), ga{n) must be ready to Il-expand using a{t). So by (2) 
and (3) of the definition of ready-to-branch, ga{n) either H'-guarantees B or is H' -incompatible with B. But 
Prigo.i'n)) C pT{n), and prin) H'-guarantees B, so prigo.i'n)) cannot be 11' -incompatible with B. Hence 
PT{ga{n)) H'-guarantees B. It is also easy to see that pT{ga{n)) H'-guarantees K. 

From dZTl i. it follows easily that 

If n G lgar{B U K), every leaf descended from of ga{n) belongs to lgar{B U K) ■ (28) 

Let 

A = {ga{n) : n G lgar{B U K)} 

In light of ( |26] | and (|28] |. we have 

lgar{B U K) is precisely the set of leaves descended from nodes in A ■ (29) 
Therefore, 

/3= ^ pv{n) 

n is a leaf descended from some aeA 
Moreover, by construction of T, no leaf may have more than one ancestor in A, and hence 

<^^^ rt is a leaf descended from a 
Now, by Lemma[T]on unitary trees, since T is unitary. 

This way of writing /3 will help us complete the proof. Now for a. 
Recall the definition of a: 

a — pv{n) 

nelgar{BUKU{a{t) = y}) 
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Denote the index set of this sum by lgar{B, K, y). Let 

Ay = {n : parent{n) £ A, the label of n is a{t) = y} 

Since lgar{B, K, y) is a subset of lgar{B) U K, ( |29] l implies that lgar{B, K, y) is precisely the set of nodes 
descended from nodes in ^j,. Hence 

a — pv{n') 
n' is a leaf descended from some neAy 

Again, no leaf may descend from more than one node of Ay, and so by the lemma on unitary trees, 

a = ^ ^ pv{n' ) = J2 Mn) (30) 

neAy „' is a leaf descended from n neAy 

Finally, we claim that every node n in ^ has a unique child in Ay, which we will label ychild{n). The existence 
and uniqueness follow from dZTl ). along with Condition[3]of Section [372l and the fact that every node in A branches 
on a{t) via [r]. Thus from ( l30t we obtain 

a = pv(ychild{n)) 

neA 

Note that if n € A, the arc from n to ychild{n) is labeled with v. Now we have: 

Pn'Uobs(B)Uobs{K){<^{t) = y) 

= X! P^{ychild{n))/ ^ 

neA neA 

= pv{n) * vj pv{n) 

neA neA 
= V- 

a 

Proposition 6 

[Tableau for causally ordered programs] 

Suppose n is a causally ordered P-log program; then there exists a tableau T of 11 which represents 11. □ 
Proof: 

Let I I be a causal order of 11, ai(ti), . . . , am(t,„) be the ordering of its terms induced by | |, and Hi, ... , n,„+i 
be the | [-induced structure of 11. 

Consider a sequence To, ... , T,n of trees where To is a tree with one node, tiq, labeled by true, and Ti is obtained 
from Ti_i by expanding every leaf of Ti_i which is ready to branch on ai{ti) via any rule relative to 11^ by this 
term. Let T = T„j . We will show that T„j is a tableau of 11 which represents 11. 

Our proof will unfold as a sequence of lemmas: 
Lemma 2 

For every k > and every leaf node n of Tk program Ilfc+i has a unique possible world W containing pT^{n). 
□ 
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Proof: 

We use induction on k. The case where k = follows from Condition (1) of Definition [T3] of causally ordered 
program. Assume that the lemma holds for i = k — 1 and consider a leaf node n of Tk- By construction of T, 
there exists a leaf node m of Tk-i which is either the parent of n or equal to n. By inductive hypothesis there is a 
unique possible world V of Ilfc containing pr^-i ("^)- 

(i) First we will show that every possible world W of Ilfc+i containing PTt_i ("^) also contains V. By the splitting 
set theorem (ILifschitz and Turner 1994| l. set V' — is a possible world of Ilfc. Obviously, PTt-i ("^) ^ V'. 
By inductive hypothesis, V = V, and hence V C W. 

Now let us consider two cases. 

(ii) ak(tk) is not active in V with respect to n^+i. In this case for every random selection rule of 11^+1 either 
Condition (2) or Condition (4) of definition [27] is not satisfied and hence there is no rule r such that m is ready 
to branch on ak(tk) via r relative to n^+i. From construction of Tk we have that m = n. By (3) of the defini- 
tion of causally ordered, the program V U Hk+i has exactly one possible world, W. Since is a splitting set 
( ILifschitz and Turner 1994| l of Ilfc+i we can use splitting set theorem to conclude that is a possible world of 
Hk+i- Obviously, W contains V and hence PTt_i ('^i). Since n ~ m this implies that W contains ('^)- 

Uniqueness follows immediately from (i) and Condition (3) of Definition[T3] 

(iii) A term ak(tk) is active in V . This means that there is some random selection rule r 

[r] random{ak{tk) : {Y : p{Y)}) ^ K- 

such that V satisfies K and there is such that p{yQ) G V . (If r does not contain p the latter condition can be 
simply omitted). Recall that in this case ak(tk) — Uo is possible in V with respect to Ilfc+i. 

We will show that m is ready to branch on Uk (tk) via rule r relative to Ilfc+i. 

Condition ( 1 ) of the definition of ready to branch" (DefinitionlZTli follows immediately from construction of Tk-i- 

To prove Condition (2) we need to show that pxt-ii^) Hfc+i-guarantees K. To see that pT^_-^{m) and K are 
Ilfc+i-compatible notice that, from Condition (2) of Definition [T3] and the fact that p{yo) G F we have that 
V U Ilfc+i has a possible world, say, Wq. Obviously it satisfies both, K and pTt-i ('^)- Now consider a possible 
world W of Ilfc+i which contains pTf,_i{'m). By (i) we have that V C W. Since V satisfies K so does W. 
Condition (2) of the definition of ready to branch is satisfied. 

To prove condition (3) consider prr{ak(tk) = y \c B) = v from Ilfc+i such that B is Ilfc+i -compatible with 
PTk-A'nT')- Ilfc -compatibility implies that there is a possible world Wq of Ilfc+i which contains both, pxk-ii^) 
and B. By (i) we have that V C Wq and hence V satisfies B. Since every possible world W of Ilfc+i containing 
PTk-i ("i) ^Iso contains V we have that W satisfies B which proves condition (3) of the definition. 

To prove Condition (4) we consider yo such that p{yo) E V (The existence of such yo is proven at the beginning 
of (iii)). We show that px^^ii'm) H^+i-guarantees p{yo)- Since ak(tk) — yo is possible in V with respect to 
Ilfc+i Condition (2) of Definition [T3] guarantees that Ilfc+i has possible world, say W, containing V. By con- 
struction, p{yo) G V and hence p{yo) and ^(m) are Ilfe+i compatible. From (i) we have that pT^-ii'm) 
Ilfc+i-guarantees p{yo). Similar argument shows that if pr^-i ('^) is Ilfc+i-compatible with p{y) then p{y) is also 
Ilfc+i-guaranteed by pT,_:^ {m). 

We can now conclude that m is ready to branch on ak (tk ) via rule r relative to n^+i . This implies that a leaf node 
n of Tk is obtained from m by expanding it by an atom ak(tk) = y- 

By Condition (2) of Definition [13] program V U Ilfc+i U obs{ak(tk) — y) has exactly one possible world, W. 
Since Lk is a splitting set of Ilfc+i we have that is a possible world of Ilfc+i. Clearly W contains pr^(7i). 
Uniqueness follows immediately from (i) and Condition (2) of Definition [T3] 
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Lemma 3 

For all k > 0, every possible world of n^+i contains pT^{n) for some unique leaf node n of Tk- □ 
Proof: 

We use induction on k. The case where = is immediate. Assume that the lemma holds for i = k ~ 1, and 
consider a possible world W of Ilfe+i. By the splitting set theorem is a possible world of F U Ilfe+i where V 
is a possible world of Ilfc. By the inductive hypothesis there is a unique leaf node m of Tk-i such that V contains 
PT^._^{rn). Consider two cases. 

(a) The attribute term ak(tk) is not active in V and hence m is not ready to branch on af;{tk)- This means that 
m is a leaf of Tk and pTt_-^{m) = pT^{m). Let n ^ m. Since V C W we have that pr^in) C W . To show 
uniqueness suppose n' is a leaf node of Tj. such that [n') C W , and n' is not equal to n. By construction of 
Tk there is some j and some yi ^ yi such that aj(Ij) = G pT,,{n') and aj(ij) — y2 & pTk{n). Since W is 
consistent and aj is a function we can conclude n cannot differ from n' . 

(b) If ak(tk) is active in V then there is a possible outcome y of ak(tk) in V with respect lik+i via some random 
selection rule r such that ak(tk) — y G W. By inductive hypothesis V contains pt,,_i{iti) for some leaf m of 
Tfc.i. Repeating the argument from part (iii) of the proof of Lemma |2]we can show that m is ready to branch 
on ak(ik) via r relative to Hk+i- Since ak(ik) = 2/ is possible in V there is a son n of tti in Tk labeled by 
flfc (tk ) = ?/■ It is easy to see that W contains (n). The proof of uniqueness is similar to that used in (a). 

Lemma 4 

For every leaf node n of Ti_i, every set B of extended literals of and every i < j < m + 1 we have 

PTi-i (n) is Hi-compatible with B iff pt,_i (j^) is 11^ -compatible with B. □ 

Proof: 

Suppose that pt,_i is Hi-compatible with B. This means that there is a possible world V of H; which satisfies 
PTi-i{n) and _B. To construct a possible world of Hj with the same property consider a leaf node m of Tj-i 
belonging to a path containing node n of Ti_i. By Lemma |2]Hj has a unique possible world containing 
PTj_i ("i)- is a splitting set of H^ and hence, by the splitting set theorem, we have that W ~ V U U where V' 
is a possible world of Hi and UHLi — 0. This implies that contains pr^ ^ (n), and hence, by Lemma|2] — V. 
Since V satisfies B and [/ n Li = we have that W also satisfies B and hence pt._i (n) is H^-compatible with 
B. 

Let be a possible world of Hj satisfying and 5. By the splitting set theorem we have that W — VUU 

where F is a possible world of Hi and U H Li = 0. Since B and pt._i (n) belong to the language of Li we have 
that B and pt,_i (j^) are satisfied by V and hence pt,_i is Hi-compatible with B. 

Lemma 5 

For every leaf node n of Ti_i, every set B of extended literals of ii_i, and every i < j < m + 1 we have 
PTi-i (n) Hi-guarantees B iff pt,_i (n) H^ -guarantees B. □ 

Let us assume that pT,_i{n) Hi-guarantees B. This implies that pT,_i{n) is Hi -compatible with B, and hence, 
by Lemma m (n) is H^ -compatible with B. Now let be a possible world of H^ satisfying pt,_i {n). By 
the splitting set theorem W — V U U where F is a possible world of Hi and U Li = (d. This implies that V 
satisfies pt._i {n). Since pt._i (n) Hi-guarantees B we also have that V satisfies B. Finally, since U DLi = we 
can conclude that W satisfies B. 
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Suppose now that pT,_i{n) IIj -guarantees B. This implies that pt,_i (n) is Hi-compatible with B. Now let V be 
a possible world of Hi containing pt._i (n). To show that V satisfies B let us consider a leaf node m of a path of 
Tj^i containing n. By Lemma|2]nj has a unique possible world W containing Pt,_i {m)- By construction, W 
also contains pt,_i C'^) and hence satisfies B. By the splitting set theorem W ~ V' U U where V' is a possible 
world of Hi and U Ci Li = Since B belongs to the language of Li it is satisfied by V'. By Lemma|2] V' — V. 
Thus V satisfies B and we conclude pt,_i (n) Hi-guarantees B. 

Lemma 6 

For every i < j < m + 1 and every leaf node n of Ti-i, n is ready to branch on term ai(ti) relative to Hi iff n is 
ready to branch on (1^ ) relative to Ilj . □ 

Proof: 

Condition (1) of DefinitionlZTjfollows immediately from construction of T's. To prove condition (2) consider a leaf 
node n of Ti__i which is ready to branch on ai(ti) relative to 11^. This means that IIj contains a random selection 
rule r whose body is Hi-guaranteed by pt,_i {n). By definition of Li, the extended hterals from K belong to the 
language Li and hence, by Lemma|5] pT,_i{n) Hj -guarantees K. 

Now consider a set B of extended literals from condition (3) of Definition |27] and assume that pTi^i (n) is Hj- 
compatible with B. To show that pTi_^{ri) H^ -guarantees B note that, by LemmaH] pTi_^{n) is Hi-compatible 
with B. Since n is ready to branch on ai(Ii) relative to H^ we have that pt,_i {n) Hi-guarantees B. By Lemma 
|5]we have that pT,_i{n) H^ -guarantees B and hence Condition (3) of Definition l27l is satisfied. Condition (4) is 
similar to check. 



As before Condition (1) is immediate. To prove Condition (2) consider a leaf node n of Ti-i which is ready to 
branch on ai{li) relative to H^. This means that PT,-i{n) Hj -guarantees K for some rule r from H^. Since Hj 
is causally ordered we have that r belongs to Hi. By Lemma|5]pT,„i {n) H^-guarantees K. Similar proof can be 
used to establish Conditions (3) and (4). 

Lemma 7 

T = Trn is a tableau for H = H„+i. □ 
Proof: 

Follows immediately from the construction of the T"s and H's, the definition of a tableau, and Lemmas|6]and|4] □ 

Lemma 8 

T = Tm represents H = Il,n+i- □ 
Proof: 

Let W be a possible world of H. By Lemma[3] W contains prin) for some unique leaf node n of T. By Lemma 2, 
W is the set of literals H-guaranteed by pT{n), and hence W is represented by n. Suppose now that n' is a node 
of T representing W. Then pT{n') H-guarantees W which implies that W contains pT^{n'). By Lemma[3]this 
means that n = n', and hence we proved that every answer set of H is represented by exactly one leaf node of T. 

Now let n be a leaf node of T. By Lemma|2]H has a unique possible world W containing pxin). It is easy to see 
that W is the set of literals represented by n. □ 
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Lemma 9 

Suppose T is a tableau representing 11. If n is a node of T which is ready to branch on a{i) via r, then all 
possible worlds of 11 compatible with prin) are probabilistically equivalent with respect to r. □ 

Proof: 

This is immediate from Conditions (3) and (4) of the definition of ready-to-branch. 

Notation: If n is a node of T which is ready to branch on a{t) via r, the Lemma|9]guarantees that there is a unique 
scenario for r containing all possible worlds compatible with pT{n). We will refer to this scenario as the scenario 
determined by n. 

We are now ready to prove the main theorem. 
TheoremU] 

Every causally ordered, unitary program is coherent. 
Proof: 

Suppose n is causally ordered and unitary. Proposition |6] tells us that 11 is represented by some tableau T. By 
Theorem[3]we need only show that 11 is unitary — i.e., that for every node n of 11, the sum of the labels of the arcs 
leaving nisi. Let tt, be a node and let s be the scenario determined by n. s satisfies ( 1 ) or (2) of the Definition[T4] In 
case (1) is satisfied, the definition of t;(n, a(t), y), along with the construction of the labels of arcs of T, guarantee 
that the sum of the labels of the arcs leaving n is 1. In case (2) is satisfied, the conclusion follows from the same 
considerations, along with the definition of PD{ W ,a{t) = y). 

We now restate and prove Theorem|2] 

Theorein|2] 

Let , . . . , a;„ be a nonempty vector of random variables, under a classical probability P, taking finitely many 
values each. Let Ri be the set of possible values of each Xi, and assume Ri is nonempty for each i. Then there 
exists a coherent P-log program 11 with random attributes xi , . . . , a;„ such that for every vector ri , . . . , r„ from 
Ri X ■ • xRn,we have 

P{xi = n, . . . , a;„ = r„) = Pn{xi = n, . . . , a;„ = r„) (31) 

□ 

Proof: 

For each i let pars{xi) = {xi, . . . , Xi-i}. Let 11 be formed as follows: For each Xi, U contains 

• Ri ' 

random{xi)- 

Also, for each Xi, every possible value y of Xi, and every vector of possible values yp of pars{xi), let 11 contain 

pr{xi = y\c pars{i) = yp) = v{i, y, yp) 
where v{i, y, yp) = P{x^ = y\pars{i) = yp). 

Construct a tableau T for 11 as follows: Beginning with the root which has depth 0, for every node n at depth i and 
every possible value y of Xi+i, add an arc leaving n, terminating in a node labeled Xi+i — y\ label the arc with 

P{x^+i = y\pT{n)). 
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We first claim that T is unitary. This follows from the construction of T and basic probability theory, since the 
labels of the arcs leaving any node n at depth i are the respective conditional probabilities, given prin), of all 
possible values of li+i. 

We now claim that T represents 11. Each answer set of r(n), the translation of 11 into Answer Set Prolog, satisfies 
xi — ri, . . . ,Xn — rn for exactly one vector ri, . . . , r,j in i?i x ... x i?„, and every such vector is satisfied in 
exactly one answer set. For the answer set S satisfying xi = ri, . . . , a;„ = r„, let M{S) be the leaf node n of T 
such that prin) = {xi = ri, . . . ,Xn — r„}. M(S) represents 5* by Definition [32l since 11 has no non-random 
attributes. Since M is a one-to-one correspondence, T represents 11. OTT l holds because 

P{xi = ri, • • •, x„ = r„) 

= P{xi = ri) X P{x2 = r2\xi = n) X . . . X P{x„ = r„\xi = n,- ■ •, Xn-i = r„_i) 
= t;(l, ri, 0) X ... X v{n, r„, (ri, . . . , r„_i)) 
= Pn{xi = ri, . . . ,a;„ = r„) 

To complete the proof we will use Theorem|3]to show that H is coherent. 11 trivially satisfies the Unique selection 
rule. The Unique probability assignment rule is satisfied because pars{xi) cannot take on two different values 
and Hp in the same answer set. 11 is consistent because by assumption 1 < n and Ri is nonempty. For the same 
reason, Fn is defined. 11 contains no do or obs literals; so we can apply Theorem 3 directly to 11 without removing 
anything. We have shown that T is unitary and represents 11. The representation is probabilistically sound by the 
construction of T. These are all the things that need to be checked to apply Theorem[3]to show that 11 is coherent. 
□ 

Finally we give proof of Proposition [T] 

Proposition 7 

Let T" be a P-log program over signature E not containing pr-atoms, and B a collection of E-literals. If 

1. all random selection rules of T are of the form random{a(t)), 

2. T" U o&s (5) is coherent, and 

3. for every term a (I) appearing in literals from B program T contains a random selection rule random{a(i)), 
then for every formula A 

Ptub{A) = PTUobs(B){A) 

□ 

Proof: 

We will need some terminology. Answer Set Prolog programs Hi and 112 are called equivalent (symbolically. 
Hi = 112) if they have the same answer sets; Hi and 112 are called strongly equivalent (symbolically Hi =s 112) 
if for every program 11 we have that Hi U 11 = 112 U 11. To simplify the presentation let us consider a program 
T' = T U B U obs{B). Using the splitting set theorem it is easy to show that is a possible world of T \J B iff 
W U obs{B) is a possible world of T' . To show 

(1) Ptub{A) ^ PruobsiBM)- 

we notice that, since T', TUB and T U ohs{B) have the same probabilistic parts and the same collections of 
(io-atoms to prove (1) it suffices to show that 

(2) is a possible world of T' iff is a possible world of T U ohs{B). 
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Let Pb = t[T') and Pohs{B) = t(T U ohs{B)). By definition of possible worlds (2) holds iff 

(3) Pb = Pobs{B) 

To prove (3) let us first notice that the set of literals S formed by relations do, obs, and intervene form a splitting 
set of programs Pb and Pobs{B) - Both programs include the same collection of rules whose heads belong to this 
splitting set. Let X be the answer set of this collection and let Qb and Qobs(B) be partial evaluations of Pb and 
Pobs(B) with respect to X and 5*. From the splitting set theorem we have that (3) holds iff 

(4) Qb = Qobs(B)- 

To prove (4) we will show that for every literal I E B there are sets Ui{l) and C/2 (0 ^ViCh that for some Q 

(5) Qobs(B) = QU {r : r e Ui{l) for some I e B}, 

(6) Qb ^ Q ^ {r : r e U2{1) for some / e B}, 

(7) Ui{l) =s U2{1) 
which will imply (4). 

Let literal Z G 5 be formed by an attribute a(I). Consider two cases: 
Case 1: intervene{a{t)) ^ X. 
Let Ui{l) consist of the rules 

(a) -a(I, Fi) ^ a{t, Y2), Yi ^ ¥2. 

(b) a(t,yi)oi- ... or a(i,yk). 

(c) <— not I. 

Let U2{1) Ui{l) U B. 

It is easy to see that due to the restrictions on random selection rules of T from the proposition Ui{l) belongs to 
the partial evaluation of r( T) with respect to X and Hence Ui{l) C (5o6s(_b)- Similarly U2{1) C (5B,andhence 
Ui{l) and U2{1) satisfy conditions (5) and (6) above. To show that they satisfy condition (7) we use the method de- 
veloped in (ILifschitz et al. 200r ). First we reinterpret the connectives of statements of Ui{l) and U2{1). In the new 
interpretation -1 will be a strong negation of Nelson JNelson 19491 1 ; not , <— , or will be interpreted as intuitionistic 
negation, implication, and disjunction respectively; , will stand for A. A program P with connectives reinterpreted 
in this way will be referred to as NL counterpart of P. Note that the NL counterpart of ^ not I is not not I. Next 
we will show that, under this interpretation, Ui{l) and U2{1) are equivalent in Nelson's intuitionistic logic (NL). 
Symbolically, 

(8) [/i(0 =NL U2{1). 

(Roughly speaking this means that Ui{l) can be derived from U2{1) and U2{1) from Ui{l) without the use of the 
law of exclusive middle.) As shown in ( ILifschitz et al. 200Tl l two programs whose NL counterparts are equivalent 
in NL are strongly equivalent, which implies (7). 

To show (8) it suffices to show that 

(9) Uiil)hNLl- 

If I is of the form a(t, yi) then let us assume a(t, yj) where j ^ i. This, together with the NL counterpart of rule 
(a) derives ^a{t,yi). Since in NL -^A h not A this derives not a(t, yi), which contradicts the NL counterpart 
not not a{t, yi) of (c). The only disjunct left in (b) is a(t, yi). 
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If I is of the form -^a{l, yi) then (9) follows from (a) and (b). 
Case 2: intervene[a[t)) G X 

This implies that there is some Ui such that do{a(t) = yi) G T. 

If / is of the form a(I) = y then since T U obs{B) is coherent, we have that y = y^, and thus Qb and Qobs(B) are 
identical. 

If I is of the form a(I) ^ y then, since T U obs{B) is coherent, we have that y ^ 
Let Ui{l) consist of rules: 

^a{t, y) ^ a(t, yi). 
a(t, Vi)- 

Let C/2(0 - C/i(OU-a(I,j/). 

Obviously Ui{l) C Qobs(B)^ U2{1) C and ?7i(/) entails [/2(^) in NL. Hence we have (7) and therefore (4). 
This concludes the proof. 

10 Appendix II: Causal Bayesian Networks 

This section gives a definition of causal Bayesian networks, closely following the definition of Judea Pearl and 
equivalent to the definition given in (Pe arl 2000] l. Pearl's definition reflects the intuition that causal influence can 
be elucidated, and distinguished from mere correlation, by controlled experiments, in which one or more variables 
are deliberately manipulated while other variables are left to their normal behavior For example, there is a strong 
correlation between smoking and lung cancer, but it could be hypothesized that this correlation is due to a genetic 
condition which tends to cause both lung cancer and a susceptibility to cigarette addiction. Evidence of a causal 
link could be obtained, for example, by a controlled experiment in which one randomly selected group of people 
would be forced to smoke, another group selected in the same way would be forced not to, and cancer rates 
measured among both groups (not that we recommend such an experiment). The definitions below characterize 
causal links among a collection V of variables in terms of the numerical properties of probability measures on V 
in the presence of interventions. Pearl gives the name "interventional distribution" to a function from interventions 
to probabihty measures. Given an interventional distributipn P*, the goal is to describe conditions under which a 
set of causal links, represented by a DAG, agrees with the probabilistic and causal information contained in P*. In 
this case the DAG will be called a causal Bayesian network compatible with P*. 

We begin with some preliminary definitions. Let V he a finite set of variables, where each v in V takes values 
from some finite set D(t)). By an assignment on we mean a function which maps each ?; in V to some member 
of D{v). We will let A{ V) denote the set of all assignments on V. Assignments on V may also be called possible 
worlds of V. 

A partial assignment on V is an assignment on a subset of V. We will say two partial assignments are consistent 
if they do not assign different values to the same variable. Partial assignments can also be called interventions. 
Let Interv{ V) be the set of all interventions on V, and let { } denote the empty intervention, that is, the unique 
assignment on the empty set of variables. 

By a probability measure on V we mean a function P which maps every set of possible worlds of F to a real 
number in [0, 1] and satisfies the Kolmogorov Axioms. 

When P is a probability measure on V, the arguments of P are sets of possible worlds of V. However, these 
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sets are often written as constraints which determine their members. So, for example, we write P{v = x) for the 
probability of the set of all possible worlds of V which assign x to v. 

The following definition captures when a DAG G is an "ordinary" (i.e., not-necessarily-causal) Bayesian network 
compatible with a given probability measure. The idea is that the graph G captures certain conditional inde- 
pendence information about the given variables. That is, given information about the observed values of certain 
variables, the graph captures which variables are relevant to particular inferences about other variables. Generally 
speaking, this may fail to reflect the directions of causality, because the laws of probability used to make these in- 
ferences (e.g., Bayes Theorem and the definition of conditional probability) do not distinguish causes from effects. 
For example if A has a causal influence on B, observations of A may be relevant to inferences about B in much 
the same way that observations of B are relevant to inferences about A. 

Definition 35 
[Compatible] 

Let P be a probability measure on V and let G be a DAG whose nodes are the variables in V. We say that P is 
compatible witli G if, under P, every v in V is independent of its non-descendants in G, given its parents in G. □ 

We are now ready to define causal Bayesian networks. In the following definition, P* is thought of as a mapping 
from each possible intervention r to the probability measures on V resulting from performing r. P* is intended 
to capture a model of causal influence in a purely numerical way, and the definition relates this causal model to a 
DAG G. 

If G is a DAG and v vertex of G, let Parents {G, v) denote the parents of w in G. 
Definition 36 

[Causal Bayesian network] 

Let P* map each intervention r in Interv{ V) to a probability measure Pr on V. Let G be a DAG whose vertices 
are precisely the members of V. We say that G is a causal Bayesian network compatible with P* if for every 
intervention r in Interv{ V), 

1. Pr is compatible with G, 

2. Pr{v = x) = 1 whenever r{v) = x, and 

3. whenever r does not assign a value to v, and s is an assignment on Parents{G, v) consistent with r, we 
have that for every x G D{v) 

Pr{v — X \ u — s(u) for all u e Parents{G, v)) 

— P^ j{v — X \ u = s{u) for all u G Parents{G, v)) □ 

Condition 1 says that regardless of which intervention r is performed, G is a Bayesian net compatible with the 
resulting probability measure Condition 2 says that when we perform an intervention on the variables of V, 
the manipulated variables "obey" the intervention. Condition 3 says that the unmanipulated variables behave under 
the influence of their parents in the usual way, as if no manipulation had occurred. 

For example, consider V = {a, d}, D{a) ~ D{d) ~ {true, false}, and P* given by the following table: 

^ This part of the definition captures some intuition about causality. It entails that given complete information about the factors immediately 
influencing a variable v (i.e., given the parents of v in G), the only variables relevant to inferences about v are its effects and indirect effects 
(i.e., descendants of v in G) — and that this property holds regardless of the intervention performed. 
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intervention 


|a, d) 


{a, — 


|— la, d\ 

L i-" ; ^ J 




{} 


0.32 


0.08 


0.06 


0.54 


{a} 


0.8 


0.2 








L J 








0.01 


0.99 


{d} 


0.4 





0.6 





{^d} 





0.4 





0.6 


{a, d} 


1 











{a,^d} 





1 








{-.a, d} 








1 





{-la, -irf} 











1 



The entries down the left margin give possible interventions, and each row defines the corresponding probability 
measure by giving the probabilities of the four singleton sets of possible worlds. Intuitively, the table represents 
P* derived from Example [TSl where a represents that the rat eats arsenic, and d represents that it dies. 

If G is the graph with a single directed arc from a to d, then one can verify that P* satisfies Conditions 1-3 of the 
definition of Causal Bayesian Network. For example, if r = {a = true}, s = {d ~ true}, v — d, and x — true, 
we can verify Condition 3 by computing its left and right hand sides using the first two rows of the table: 

LHS = P{a}{d I a) = - 8/(G-8 + 0- 2) = 0-8 

RHS = P{ I a) = • 32/(0 • 32 + • 08) = ■ 8 

Now let G' be the graph with a single directed arc from d to a. We can verify that P* fails to satisfy Condition 3 
for G' with r = {a = true}, v = d, x — true, and ,s the empty assignment, viz., 

LHS = P{a}{d) = 0- 8 + = 0- 8 

RHS = F{ = ■ 32 + ■ 6 = • 38 

This tells us that P* given by the table is not compatible with the hypothesis that the rat's eating arsenic is caused 
by its death. 

Definition 36 leads to the following proposition that suggests a straightforward algorithm to compute probabilities 
with respect to a causal Bayes network with nodes vi, . . . ,Vk, after an intervention r is done. 

Proposition 8 gPearl 2000V ) 

Let G be a causal Bayesian network, with nodes V ~ vi — xi, . . . ,Vk — Xk, compatible with an interventional 
distribution P* . Suppose also that r is an intervention in Interv{ V), and the possible world vi = xi, . . . , Vk — Xk 
is consistent with r. Then 

Pr{vi = xi,. . . ,Vk ^ Xk) = P{}{vi = Xi\pa^{r){xi,...,Xk)) 

i:r{v,) is not defined 

where pai{xi, . . . , Xk)) is the unique assignment world on Parents{G, Vi) compatible with v\ = xi, . . . ,Vk = Xk- 
□ 

Theorem 4 

Let G be a DAG with vertices V = {vi, . . . ,Vk} and P* be as defined in Definition 36. For an intervention r, let 
rfo(r) denote the set {do{vi = r{vi)) : r{vi) is defined }. 
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Then there exists a P-log program tt with random attributes wi , . . . , such that for any intervention r in 
Interv{V) and any assignment vi = xi, . . . , Vk = Xk have 

Pr{vi = Xi, . . . ,Vk = Xk) = PnLldo(r){l^l = Xi, . . . ,Vk = Xk) (32) 

□ 

Proof: We will first give a road map of the proof. Our proof consists of the following four steps. 

(i) First, given the antecedent in the statement of the theorem, we will construct a P-log program tt which, as we 
will ultimately show, satisfies (l32T i. 

(ii) Next, we will construct a P-log program 7r(r) and show that: 

Pi,\Jdo(r){vi = Xi,...,Vk^ Xk) ^ P^(r){vi ^ Xi,. . . ,Vk ^ Xk) (33) 

(iii) Next, we will construct a finite Bayes net G{r) that defines a probability distribution P' and show that: 

P-K(r){vi = Xi,. . . ,Vk = Xk) = P'ivi = Xi,. . . ,Vk = Xk) (34) 

(iv) Then we will use Proposition 1 to argue that: 

P'{vi = Xi,. . . ,Vk = Xk) = Pr{vi = Xi,...,Vk^ Xk) (35) 

(I32l then follows from (O and (O. 
We now elaborate on the steps (i)-(iv). 

Step (i) Given the antecedent in the statement of the theorem, we will construct a P-log program tt as follows: 

(a) For each variable Vi in V, n contains: 

random{vi). 
Vt : D{vi). 

where D{vi) is the domain of Vi. 

(b) For any Vi G V, such that parents{G,Vi) — {vi-^ , . . . , Vi^} , any y e D{vi), and any Xi^,...,Xi^ in 
D (vi-^) , . . . , D (vi^) respectively, tt contains the pr-atom: 

pr{v^=y \c Vi^=Xi^,. . . ■yi„=a;,„,,) = P{ }{vi=y\vi^=Xi^, . . . Vi^=Xi^)- 

Step (ii) Given the antecedent in the statement of the theorem, and an intervention r in Interv{ V) we will now 
construct a P-log program 7r(r) and show that (33[ is true. 

(a) For each variable Vi in V, if r{vi) is not defined, then 7r(r) contains random{vi) and Vi : D{vi), where D{vi) 
is the domain of Vi . 
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(b) The pr-atoms in 7r(r) are as follows. For any node Vi such that r{vi) is not defined let {vi^_^ , . . . ,Vi^ } consists 
of all elements of parents{G, Vi) = {vi-^ , . . . , Vi^} where r is not defined. Then the following pr-atom is in 7r(r). 

p{v^ = X I Vi^^ = y^^^ , . . . , Vi^^ = y^^J = p^-^[vi = x \ v^^ = j/i^ , . . . , = j/,,J-, where for all e 
parents {G, Vi), if r{vi^) is defined then yi^ = 

Now let us compare the P-log programs tt U do{r) and 7r(r). Their pr-atoms differ In addition, for a variable Vi, 
if r{vi) is defined then tt U do{r) has do{vi = '''{vi)) and random{vi) while 7r(r) has neither For variables, 
Vj, where r{vj) is not defined both tt U do{r) and 7r(r) have random{vi) . It is easy to see that there is a one- 
to-one correspondence between possible worlds of tt U rfo(r) and 7r(r); for any possible world of tt U do(r) 
the corresponding possible world W for 7r(r) can be obtained by projecting on the atoms about variables Vj for 
which r{vj) is not defined. For a Vi for which r{vi) is defined, W will contain intervene{vi), and will not have 
an assigned probability. The default probability PD{W , Vi — r{vi)) will be j]y^:p^^- Now it is easy to see that the 
unnormaUzed probability measure associated with W will be 



V, : r(v,) is defined 

times the unnormaUzed probabiUty measure associated with W' and hence their normalized probability measures 
will be the same. Thus Prrudo(r) ("i 2^1, • ■ • , «fe = = Pr^ir) {vi = xi, . . . , Vk ^ Xk). 

Step (iii) Given G, P* and any intervention r in Interv{V) we will construct a finite Bayes net G{r). Let P' 
denote the probability with respect to this Bayes net. 

The nodes and edges of G(r) are as follows. All vertices Vi in G such that r{vi) is not defined are the only vertices 
in G{r). For any edge from Vi to vj in G, only if r{vj) is not defined the edge from Vi to vj is also an edge in 
G{r). No other edges are in G(r). The conditional probability associated with the Bayes net G{r) is as follows: 
For any node Vi of G(r), let parents{G{r), Vi) — {vi^_^ , ■ • ■ , Vi^^ } Q parents{G, Vi) = {uj^ , . . . , Vi^}- We define 
the conditional probability p{vi = x \ Vi^^ = y^^^ , . . . , v^^^ = y^^^) ^ P^y{vi ^ x \ v^^ = y,^, . . . , v^,^^ = 2/,„J, 
where for all Vi^ G parents{G, Vi), if r{vi^) is defined (i.e., ^ parents{G{r), Vi)) then yi^ — r{vi^). 

From Theorem 2 which shows the equivalence between a Bayes net and a representation of it in P-log, which we 
will denote by 7r(G'(r)) , we know that P'{vi = xi, . . . ,Vk = xt) = PTr(G{r))ivi = xi, . . . ,Vk = Xk). It is easy 
to see that 7r( G'(r)) is same as 7r(r). Hence ( [34] l holds. 

Step (iv) It is easy to see that P'{vi — xi, . . . ,Vk — Xk) is equal to the right hand side of Proposition 1. Hence 
(O holds. 



11 Appendix III: Semantics of ASP 

In this section we review the semantics of ASP. Recall that an ASP rule is a statement of the form 

k) or ... orlk ^ 4+1, ■ • ■ , 4n, not 4„+i, . ...not In (36) 

where the 4's are ground literals over some signature S. An ASP program, 11, is a collection of such rules over 
some signature cr(n), and a partial interpretation of cr(n) is a consistent set of ground literals of the signature. A 
program with variables is considered shorthand for the set of all ground instantiations of its rules. The answer set 
semantics of a logic program 11 assigns to 11 a collection of answer sets — each of which is a partial interpretation 
of CT(n) corresponding to some possible set of beliefs which can be built by a rational reasoner on the basis of 
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rules of n. As mentioned in the introduction, in the construction of such a set, 5, the reasoner should satisfy the 
rules of n and adhere to the rationality principle which says that one shall not believe anything one is not forced 
to believe. A partial interpretation S satisfies Rule[36]if whenever Ik+i , . . . ,lm are in S and none of Im+i, ■ ■ ■ ,ln 
are in S, the set S contains at least one k where < i < fc. The definition of an answer set of a logic program is 
given in two steps: 

First we consider a program 11 not containing default negation not. 

Definition 37 
(Answer set - part one) 

A partial interpretation 5" of the signature cr(n) of 11 is an answer set for 11 if S' is minimal (in the sense of 
set-theoretic inclusion) among the partial interpretations of (T(n) satisfying the rules of 11. □ 

The rationality principle is captured in this definition by the minimality requirement. 

To extend the definition of answer sets to arbitrary programs, take any program 11, and let 5 be a partial interpre- 
tation of o'(n). The reduct of 11 relative to S is obtained by 

1 . removing from 11 all rules containing not I such that I £ S, and then 

2. removing all literals of the form not I from the remaining rules. 

Thus is a program without default negation. 

Definition 38 
(Answer set - part two) 

A partial interpretation S of cr (11) is an answer set for 11 if is an answer set for . □ 

The relationship between this fix-point definition and the informal principles which form the basis for the notion 
of answer set is given by the following proposition. 

Proposition 9 

Baral and Gelfond, (I Baral, and Gelfond 199^ 
Let S be an answer set of ASP program 11. 

(a) S satisfies the rules of the ground instantiation of H. 

(b) If literal I £ S then there is a rule r from the ground instantiation of 11 such that the body of r is satisfied by S 
and I is the only literal in the head of r satisfied by S. □ 

The rule r from (b) "forces" the reasoner to believe I. 

It is easy to check that program p{a) or p{b) has two answer sets, {p{a)} and {p{b)}, and pro- 
gram p{a) ^ not p{b) has one answer set, {p{a)}. Program Pi from the introduction indeed has one 
answer set {p{a),^p{b), q{c)}, while program P2 has two answer sets, {p{a),^p{b),p{c),^q{c)} and 
{p(a),-.p(&),-.p(c),-.g(c)}. 

Note that the left-hand side (the head) of an ASP rule can be empty. In this case the rule is often referred to as a 
constraint or denial. The denial ^ B prohibits the agent associated with the program from having a set of beliefs 
satisfying B. For instance, program p{a) or ^p{a) has two answer sets, {p(a)} and {-ip(a)}. The addition of 
a denial ^ p{a) eliminates the former; {-ip(a)} is the only answer set of the remaining program. Every answer 
set of a consistent program 11 U {/■} contains I while a program 11 U {<— not l-^ may be inconsistent. While the 
former tells the reasoner to believe that / is true the latter requires him to find support of his belief in I from 11. If, 
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say, n is empty then the first program has the answer set {1} while the second has no answer sets. If 11 consists of 
the default -^l ^ not I then the first program has the answer set I while the second again has no answer sets. 

Some additional insight into the difference between / and ^ not I can also be obtained from the relationship 
between ASP and intuitionistic or constructive logic ( [Ferraris, and Lifschitz 2005] l which distinguishes between I 
and ^-i/. In the corresponding mapping the denial corresponds to the double negation of I. 

To better understand the role of denials in ASP one can view a program 11 as divided into two parts: 11^ consisting 
of rules with non-empty heads and 11^ consisting of the denials of 11. One can show that S is an answer set of 
n iff it is an answer set of 11^ which satisfies all the denials from 11^. This property is often exploited in answer 
set programming where the initial knowledge about the domain is often defined by 11^ and the corresponding 
computational problem is posed as the task of finding answer sets of 11^. satisfying the denials from 11^. 
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